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Preface 



Linear algebra has two aspects. Abstractly, it is the study of vector spaces over 
fields, and their linear maps and bilinear forms. Concretely, it is matrix theory: 
matrices occur in all parts of mathematics and its applications, and everyone work- 
ing in the mathematical sciences and related areas needs to be able to diagonalise 
a real symmetric matrix. So in a course of this kind, it is necessary to touch on 
both the abstract and the concrete aspects, though applications are not treated in 
detail. 

On the theoretical side, we deal with vector spaces, linear maps, and bilin- 
ear forms. Vector spaces over a field K are particularly attractive algebraic ob- 
jects, since each vector space is completely determined by a single number, its 
dimension (unlike groups, for example, whose structure is much more compli- 
cated). Linear maps are the structure-preserving maps or homomorphisms of vec- 
tor spaces. 

On the practical side, the subject is really about one thing: matrices. If we need 
to do some calculation with a linear map or a bilinear form, we must represent it 
by a matrix. As this suggests, matrices represent several different kinds of things. 
In each case, the representation is not unique, since we have the freedom to change 
bases in our vector spaces; so many different matrices represent the same object. 
This gives rise to several equivalence relations on the set of matrices, summarised 
in the following table: 



Equivalence 


Similarity 


Congruence 


Orthogonal 
similarity 


Same linear map 
a:V^W 


Same linear map 
a:V^V 


Same bilinear 
form b onV 


Same self-adjoint 
a:V w.r.t. 
orthonormal basis 


A' = Q^AP 
P, Q invertible 


A' =P^AP 
P invertible 


A' = P^AP 
P invertible 


A' = P^AP 
P orthogonal 



The power of linear algebra in practice stems from the fact that we can choose 
bases so as to simplify the form of the matrix representing the object in question. 
We will see several such "canonical form theorems" in the notes. 
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These lecture notes correspond to the course Linear Algebra II, as given at 
Queen Mary, University of London, in the first sememster 2005-6. 
The course description reads as follows: 

This module is a mixture of abstract theory, with rigorous proofs, and 
concrete calculations with matrices. The abstract component builds 
on the notions of subspaces and linear maps to construct the theory 
of bilinear forms i.e. functions of two variables which are linear in 
each variable, dual spaces (which consist of Unear mappings from the 
original space to the underlying field) and determinants. The concrete 
applications involve ways to reduce a matrix of some specific type 
(such as symmetric or skew-symmetric) to as near diagonal form as 
possible. 

In other words, students on this course have met the basic concepts of linear al- 
gebra before. Of course, some revision is necessary, and I have tried to make the 
notes reasonably self-contained. If you are reading them without the benefit of a 
previous course on linear algebra, you will almost certainly have to do some work 
filling in the details of arguments which are outlined or skipped over here. 

The notes for the prerequisite course. Linear Algebra I, by Dr Francis Wright, 
are currently available from 

http : / /centaur . maths . qmul . ac . uk/Lin_Alg_I/ 

I have by-and-large kept to the notation of these notes. For example, a general 
field is called K, vectors are represented as column vectors, Unear maps (apart 
from zero and the identity) are represented by Greek letters. 

I have included in the appendices some extra-curricular applications of lin- 
ear algebra, including some special determinants, the method for solving a cubic 
equation, the proof of the "Friendship Theorem" and the problem of deciding the 
winner of a football league, as well as some worked examples. 



Peter J. Cameron 
September 5, 2008 
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Chapter 1 
Vector spaces 



These notes are about linear maps and bilinear forms on vector spaces, how we 
represent them by matrices, how we manipulate them, and what we use this for. 

1.1 Definitions 

Definition 1.1 A field is an algebraic system consisting of a non-empty set K 
equipped with two binary operations + (addition) and • (multiplication) satisfying 
the conditions: 

(A) (K, +) is an abelian group with identity element (called zero); 
(M) (K\ {0}, •) is an abelian group with identity element 1; 

(D) the distributive law 

a{b + c) = ab + ac 

holds for all a,b,c & K. 

If you don't know what an abelian group is, then you can find it spelled out in 
detail in Appendix A. In fact, the only fields that I will use in these notes are 

• Q, the field of rational numbers; 

• R, the field of real numbers; 

• C, the field of complex numbers; 

• Fp, the field of integers mod p, where p is a prime number. 

I will not stop to prove that these structures really are fields. You may have seen 
Fp referred to as Zp. 
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CHAPTER 1. VECTOR SPACES 



Definition 1.2 A vector space V over a field K is an algebraic system consisting 
of a non-empty set V equipped with a binary operation + (vector addition), and 
an operation of scalar multiplication 



such that the following rules hold: 

(VA) (y, +) is an abeUan group, with identity element (the zero vector). 

(VM) Rules for scalar multiplication: 

(VMO) For any aEK,vEV, there is a unique element av e V. 

(VMl) For any a e K, w, v e V, we have a{u + v) = au + av. 

(VM2) For any a, & e K, v e V, we have {a + b)v = av + bv. 

(VMS) For any a,b eK,v eV,we have {ab)v = a{bv). 

(VM4) For any v G V, we have Iv = v (where 1 is the identity element of K). 

Since we have two kinds of elements, namely elements of K and elements of 
V, we distinguish them by calling the elements of K scalars and the elements of 
V vectors. 

A vector space over the field R is often called a real vector space, and one 
over C is a complex vector space. 

Example 1.1 The first example of a vector space that we meet is the Euclidean 
plane M^. This is a real vector space. This means that we can add two vectors, and 
multiply a vector by a scalar (a real number). There are two ways we can make 
these definitions. 

• The geometric definition. Think of a vector as an arrow starting at the origin 
and ending at a point of the plane. Then addition of two vectors is done by 
the parallelogram law (see Figure 1.1). The scalar multiple av is the vector 
whose length is |a| times the length of v, in the same direction if a > and 
in the opposite direction if a < 0. 

• The algebraic definition. We represent the points of the plane by Cartesian 
coordinates (x,};). Thus, a vector v is just a pair (x,};) of real numbers. Now 
we define addition and scalar multiplication by 



(a, v) e K X y 1-^ flv e y 



{x\,y\) + {x2,y2) 
a{x,y) 



(xi+X2,yi+y2), 
{ax., ay). 



1.2. BASES 
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Figure 1.1: The parallelogram law 

Not only is this definition much simpler, but it is much easier to check that 
the rules for a vector space are really satisfied! For example, we check the 
law a{v + w) = av + aw. Let v = (xi,yi) and w = {x2,y2)- Then we have 

a{v + w) = a{{xi,yi) + {x2,y2) 
= a{xi+X2,yi+y2) 
= {cixi+ax2,ay[+ay2) 
= {axi,ayi) + {ax2,ay2) 
= av + aw. 

In the algebraic definition, we say that the operations of addition and scalar 
multiplication are coordinatewise: this means that we add two vectors coordinate 
by coordinate, and similarly for scalar multiplication. 

Using coordinates, this example can be generalised. 

Example 1.2 Let n be any positive integer and K any field. Let V = K", the set 
of all n-tuples of elements of K. Then V is a vector space over K, where the 
operations are defined coordinatewise: 

(ai,a2,---,«n) + (^l,^2,---,^«) = (ai (32+^2, ■••,«« + ^«), 

c(ai,a2, •••,««) = (cai,ca2,...,ca„). 

1.2 Bases 

This example is much more general than it appears: Every finite-dimensional vec- 
tor space looks like Example 1.2. Here's why. 

Definition 1.3 Let V be a vector space over the field K, and let vi , . . . , v„ be vec- 
tors in V. 
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(a) The vectors vi,V2,...,v„ are linearly independent if, whenever we have 
scalars ci , C2, . . . , c„ satisfying 

CiVi +C2V2H \-CnVn = 0, 

then necessarily ci = C2 = • • • = 0. 

(b) The vectors vi , V2, . . . , v„ are spanning if, for every vector v e V, we can find 
scalars ci , C2, . . . , c„ e K such that 

V = CiVi +C2V2H \-CnVn- 

In this case, we write V = (vi , V2, . . . , Vn) . 

(c) The vectors vi , V2, . . . , v„ form a ^a^w for V if they are linearly independent 
and spanning. 

Remark Linear independence is a property of a list of vectors. A list containing 
the zero vector is never linearly independent. Also, a list in which the same vector 
occurs more than once is never linearly independent. 

I will say "Let 5 = (vi , . . . , v„) be a basis for V" to mean that the list of vectors 
vi , . . . , v„ is a basis, and to refer to this list as B. 

Definition 1.4 Let V be a vector space over the field K. We say that V is finite- 
dimensional if we can find vectors vi , V2, . . . , v„ e V which form a basis for V. 

Remark In these notes we are only concerned with finite-dimensional vector 
spaces. If you study Functional Analysis, Quantum Mechanics, or various other 
subjects, you will meet vector spaces which are not finite dimensional. 

Proposition 1.1 The following three conditions are equivalent for the vectors 
vi , . . . , v„ of the vector space V over K: 

(a) vi, . . . ,v„ is a basis; 

(b) VI,. . . ,v„is a maximal linearly independent set ( that is, if we add any vector 
to the list, then the result is no longer linearly independent); 

(c) vi, . . . ,v„ is a minimal spanning set (that is, if we remove any vector from 
the list, then the result is no longer spanning). 

The next theorem helps us to understand the properties of linear independence. 
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Theorem 1.2 (The Exchange Lemma) Let V bea vector space over K. Suppose 
that the vectors vi , . . . , v„ are linearly independent, and that the vectors wi , . . . , 
are linearly independent, where m> n. Then we can find a number i with I <i<m 
such that the vectors vi , . . . , v„, w, are linearly independent. 

In order to prove this, we need a lemma about systems of equations. 

Lemma 1.3 Given a system (*) 

anxi+ai2X2-\ \-a\mXm = 0, 

a2lXi + 022-^2 H 1- a2mXm = 0) 

a„ixi + a„2X2 H h a„mXm = 

of homogeneous linear equations, where the number n of equations is strictly less 
than the number m of variables, there exists a non-zero solution (^i, . . . (that 
is, xi,. ..,Xm are not all zero). 

Proof This is proved by induction on the number of variables. If the coefficients 
an, ^21, . . . , a„i of are all zero, then putting xi — I and the other variables zero 
gives a solution. If one of these coefficients is non-zero, then we can use the 
corresponding equation to express xi in terms of the other variables, obtaining 
n — I equations in m — 1 variables. By hypothesis, «— l<m— 1. So by the 
induction hypothesis, these new equations have a non-zero solution. Computing 
the value of xi gives a solution to the original equations. 

Now we turn to the proof of the Exchange Lemma. Let us argue for a contra- 
diction, by assuming that the result is false: that is, assume that none of the vectors 
Wi can be added to the list (vi , . . . , v„) to produce a larger linearly independent list. 
This means that, for all j, the list (vi, . . . , v„, w,) is linearly dependent. So there 
are coefficients ci , . . . , c„, ci, not all zero, such that 

ClViH \-CnVn + dWi = 0. 

We cannot have d = 0; for this would mean that we had a linear combination of 
VI , . . . , v„ equal to zero, contrary to the hypothesis that these vectors are linearly 
independent. So we can divide the equation through by d, and take w; to the other 
side, to obtain (changing notation slightly) 

n 

Wi = aiiVi + a2iV2 H h a„iV„ = a jiV j. 
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We do this for each value of / = 1 , . . . , m. 

Now take a non-zero solution to the set of equations (*) above: that is. 



But the coefficients are not all zero, so this means that the vectors (wi, . . .,Wm) 
are not linearly dependent, contrary to hypothesis. 

So the assumption that no can be added to (vi, . . . , v„) to get a linearly 
independent set must be wrong, and the proof is complete. 

The Exchange Lemma has some important consequences: 

Corollary 1.4 Let V be a finite-dimensional vector space over afield K. Then 

(a) any two bases ofV have the same number of elements; 

(b) any linearly independent set can be extended to a basis. 

The number of elements in a basis is called the dimension of the vector space 
V. We will say "an n-dimensional vector space" instead of "a finite-dimensional 
vector space whose dimension is n". We denote the dimension of V by dim(V). 

Proof Let us see how the corollary follows from the Exchange Lemma. 

(a) Let (vi, . . . , v„) and (wi, . . . ,Wm) be two bases for V. Suppose, for a con- 
tradiction, that they have different numbers of elements; say that n < m, without 
loss of generality. Both lists of vectors are linearly independent; so, according to 
the Exchange Lemma, we can add some vector Wi to the first list to get a larger 
linearly independent list. This means that vi , . . . , v„ was not a maximal linearly 
independent set, and so (by Proposition 1.1) not a basis, contradicting our assump- 
tion. We conclude that m = n, as required. 

(b) Let (vi, . . . , v„) be linearly independent and let (wi, . . . .Wm) be a basis. 
Necessarily n < m, since otherwise we could add one of the vs to (i, . . . , Wm) to 
get a larger linearly independent set, contradicting maximality. But now we can 
add some ws to (vi , . . . , v„) until we obtain a basis. 



m 



'^ajiXi = 

i=l 



for7 = l,...,n. 

Multiplying the formula for by jc, and adding, we obtain 
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Remark We allow the possibility that a vector space has dimension zero. Such 
a vector space contains just one vector, the zero vector 0; a basis for this vector 
space consists of the empty set. 

Now let y be an n-dimensional vector space over K. This means that there is a 
basis vi , V2, . . . , v„ for V. Since this list of vectors is spanning, every vector v eV 
can be expressed as 

V = CiVi +C2V2H \-CnVn 

for some scalars ci , C2, . . . , c„ G K. The scalars ci , . . . , c„ are the coordinates of 
V (with respect to the given basis), and the coordinate representation of v is the 
n-tuple 

(ci,C2,...,c„) eK". 
Now the coordinate representation is unique. For suppose that we also had 

V = CiVi+C2V2H Vc'nVn 

for scalars , C2 . . . , c^. Subtracting these two expressions, we obtain 

= (ci - c\)v\ + (C2 - C2)V2 H h (c„ - c'n)Vn. 

Now the vectors vi,V2...,v„ are linearly independent; so this equation implies 
that ci—c\ — 0, C2 — C2 = 0, . . . , Cfi — c'„ — 0; that is, 

Ci = c'l, C2 = C2, ... Cn = c'„. 

Now it is easy to check that, when we add two vectors in V, we add their 
coordinate representations in (using coordinatewise addition); and when we 
multiply a vector v G V by a scalar c, we multiply its coordinate representation 
by c. In other words, addition and scalar multiplication in V translate to the same 
operations on their coordinate representations. This is why we only need to con- 
sider vector spaces of the form K", as in Example 1.2. 

Here is how the result would be stated in the language of abstract algebra: 

Theorem 1.5 Any n-dimensional vector space over a field K is isomorphic to the 
vector space K". 

1.3 Row and column vectors 

The elements of the vector space K" are all the n-tuples of scalars from the field 
K. There are two different ways that we can represent an n-tuple: as a row, or as 
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a column. Thus, the vector with components 1, 2 and —3 can be represented as a 
row vector 

[1 2 -3] 

or as a column vector 

' 1 " 

2 . 
_-3_ 

(Note that we use square brackets, rather than round brackets or parentheses. But 
you will see the notation (1,2,-3) and the equivalent for columns in other books ! ) 

Both systems are in common use, and you should be familiar with both. The 
choice of row or column vectors makes some technical differences in the state- 
ments of the theorems, so care is needed. 

There are arguments for and against both systems. Those who prefer row 
vectors would argue that we already use {x,y) or {x,y,z) for the coordinates of 
a point in 2- or 3-dimensional Euclidean space, so we should use the same for 
vectors. The most powerful argument will appear when we consider representing 
linear maps by matrices. 

Those who prefer column vectors point to the convenience of representing, 
say, the linear equations 

2x + 3y = 5, 
4X + 53; = 9 

in matrix form 



"2 


3" 




X 




"5" 


4 


5 




y. 
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Statisticians also prefer column vectors: to a statistician, a vector often represents 
data from an experiment, and data are usually recorded in columns on a datasheet. 
/ will use column vectors in these notes. So we make a formal definition: 

Definition 1.5 Let V be a vector space with a basis B = (vi, V2, . . . ,v„). If v = 

civi + C2V2 H \- CnVn, then the coordinate representation of v relative to the 

basis B is 

'c\' 




In order to save space on the paper, we often write this as 

[v]s=[ci C2 ... Vn]^ . 

The symbol T is read "transpose". 
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1.4 Change of basis 

The coordinate representation of a vector is always relative to a basis. We now 
have to look at how the representation changes when we use a different basis. 

Definition 1.6 Let 5 = (vi , . . . , v„) and B' = {v\, . . . , v^) be bases for the ^-dimensional 
vector space V over the field K. The transitition matrix P from B to B' is the nxn 
matrix whose jih. column is the coordinate representation [v^js of the jih. vector 
of B' relative to B. If we need to specify the bases, we write Pb^b'- 

Proposition 1.6 Let B and B' be bases for the n-dimensional vector space V over 
the field K. Then, for any vector v &V, the coordinate representations ofv with 
respect to B and B' are related by 

[v]b = P[v]b'. 

Proof Let pij be the i, j entry of the matrix P. By definition, we have 

n 
i=l 

Take an arbitrary vector v eV, and let 

[v]fi = [ci,...,c„]^, [v]b' = [di,...,dn]^. 
This means, by definition, that 

n n 

v=Y,CiVi=Y,djv'j. 

i=\ j=\ 

Substituting the formula for v'j into the second equation, we have 

n In 

j=i v=i 

Reversing the order of summation, we get 

n I n \ 

v = £ \Y^Pi}dj\ Vi. 

Now we have two expressions for v as a linear combination of the vectors v,. By 
the uniqueness of the coordinate representation, they are the same: that is, 

n 

Ci = L Pijdj- 
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In matrix form, this says 







'd\' 




= P 




.Cn. 




.dn. 



or in other words 
as required. 

In this course, we will see four ways in which matrices arise in linear algebra. 
Here is the first occurrence: matrices arise as transition matrices between bases 
of a vector space. 

The next corollary summarises how transition matrices behave. Here / denotes 
the identity matrix, the matrix having Is on the main diagonal and Os everywhere 
else. Given a matrix P, we denote by P~^ the inverse of P, the matrix Q satisfying 
PQ = QP = I. Not every matrix has an inverse: we say that P is invertible or 
non-singular if it has an inverse. 

Corollary 1.7 Let B,B',B" be bases of the vector space V. 

(a) Pb,b=I. 

(b) Pb',b = {Pb,b')~'- 

(c) Pb,b" =PB,B'PB',B"■ 
This follows from the preceding Proposition. For example, for (b) we have 

Ms = Pb,b' Mb' , Ms' = Pb',b [v]b, 

so 

Ms = Pb,b'Pb',b Ms- 
By the uniqueness of the coordinate representation, we have Pb,b'Pb'.b = ^• 

Corollary 1.8 The transition matrix between any two bases of a vector space is 
invertible. 

This follows immediately from (b) of the preceding Corollary. 

Remark We see that, to express the coordinate representation w.r.t. the new 
basis in terms of that w.r.t. the old one, we need the inverse of the transition matrix: 



Mb' = Pb'b'Mb. 



1 .5. SUBSPACES AND DIRECT SUMS 
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Example Consider the vector space R , with the two bases 



B 



The transition matrix is 



B' 



1 2 
1 3 



3 -2 
-1 1 



whose inverse is calculated to be 

Pb',b 

So the theorem tells us that, for any j:, e M, we have 

{3x-2y) 



X 




1 


+y 





y 


= X 





1 



+ {-x+y) 



as is easily checked. 

1.5 Subspaces and direct sums 

Definition 1.7 A non-empty subset of a vector space is called a subspace if it 
contains the sum of any two of its elements and any scalar multiple of any of its 
elements. We write U <V Xo mean "U is a subspace of V". 

A subspace of a vector space is a vector space in its own right. 
Subspaces can be constructed in various ways: 

(a) Let vi , . . . , v„ e y . The span of (vi , . . . , v„) is the set 

{ciVi +C2V2H \-CnVn : Ci , . . . , C„ G K}. 

This is a subspace of V. Moreover, (vi,...,v„) is a spanning set in this 
subspace. We denote the span of vi , . . . , v„ by (vi , . . . , v„) . 

(b) Let Ui and U2 be subspaces of V. Then 

- the intersection ?7i fl ?72 is the set of all vectors belonging to both Ui 
and U2', 

- the sum U\ + U2 is the set {mi + U2: u\ &U\,U2& U2} of all sums of 
vectors from the two subspaces. 



Both U\ n U2 and U\ + U2 are subspaces of V. 
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The next result summarises some properties of these subspaces. Proofs are left 
to the reader. 

Proposition 1.9 Let V be a vector space over K. 

(a) For any vi , . . . , v„ G V, the dimension o/ (vi , . . . , v„) is at most n, with equal- 
ity if and only i/vi , . . . , v„ are linearly independent. 

(b) For any two subspaces U\ and U2 ofV, we have 

dim(C/i n U2) + dim(C/i + U2) = dim(C/i) + dim(C/2)- 

An important special case occurs when Ui (11/2 is the zero subspace {0}. In 
this case, the sum U1 + U2 has the property that each of its elements has a unique 
expression in the form ui + U2, for ui G Ui and U2 G U2. For suppose that we had 
two different expressions for a vector v, say 

V=Ui-\-U2 — u[ +U2, UijU'i G Ui,U2,U2 G U2. 

Then 

Ml — Mj = U2 — U2- 

But Ml — u[ G Ui, and M2 — M2 G U2', so this vector is in Ui fl U2, and by hypothesis 
it is equal to 0, so that mi = u[ and M2 = MjI that is, the two expressions are 
not different after all! In this case we say that U1+U2 is the direct sum of the 
subspaces Ui and U2, and write it as ?7i © ?72- Note that 

dim(f/i © U2) = dim(f/i) + dim(f/2). 

The notion of direct sum extends to more than two summands, but is a little 
complicated to describe. We state a form which is sufficient for our purposes. 

Definition 1.8 Let Ui,...,Urhe subspaces of the vector space V. We say that V 
is the direct sum ofUi,...,Ur, and write 

V = Ui®...®Ur, 

if every vector v G V can be written uniquely in the form v = mi H \-Ur with 

Ui G f// for j = 1 , . . . , r. 

Proposition 1.10 IfV = C/i © © C/^, then 

(a) dim(y) = dim([/i) + • • • + dim(C/^); 

(b) ifBi is a basis for Ujfor i — 1, . . . , r, then 5i U ■ • • UBr is a basis for V. 



Chapter 2 

Matrices and determinants 



You have certainly seen matrices before; indeed, we met some in the first chapter 
of the notes. Here we revise matrix algebra, consider row and column operations 
on matrices, and define the rank of a matrix. Then we define the determinant of 
a square matrix axiomatically and prove that it exists (that is, there is a unique 
"determinant" function satisfying the rules we lay down), and give some methods 
of calculating it and some of its properties. Finally we prove the Cayley-Hamilton 
Theorem: every matrix satisfies its own characteristic equation. 

2.1 Matrix algebra 

Definition 2.1 A matrix of size mxn over a field K, where m and n are positive 

integers, is an array with m rows and n columns, where each entry is an element 
of K. For 1 < z < m and I < j <n, the entry in row / and column j of A is denoted 
by Aij, and referred to as the (/, j) entry of A. 

Example 2.1 A column vector in can be thought of as a n x 1 matrix, while a 
row vector is a 1 x n matrix. 

Definition 2.2 We define addition and multipUcation of matrices as follows. 

(a) Let A and B be matrices of the same size mxn over K. Then the sum A+B 
is defined by adding corresponding entries: 

{A + B)ij^Aij + Bij. 

(b) Let A be an m X R matrix and B an nx p matrix over K. Then the product 
AB is the mx p matrix whose (/, 7) entry is obtained by multiplying each 
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element in the ith row of A by the corresponding element in the jth column 
of B and summing: 

iAB)ij=j^AikBkj. 

k=l 

Remark Note that we can only add or multiply matrices if their sizes satisfy 
appropriate conditions. In particular, for a fixed value of n, we can add and mul- 
tiply nxn matrices. It turns out that the set M„(K) of n x n matrices over K is 
a ring with identity: this means that it satisfies conditions (A0)-(A4), (M0)-(M2) 
and (D) of Appendix 1. The zero matrix, which we denote by O, is the matrix 
with every entry zero, while the identity matrix, which we denote by /, is the ma- 
trix with entries 1 on the main diagonal and everywhere else. Note that matrix 
multiplication is not commutative: BA is usually not equal to AB. 

We already met matrix multiplication in Section 1 of the notes: recall that if 
Pb,b' denotes the transition matrix between two bases of a vector space, then 

Pb,b'Pb',b" = Pb,b"- 



2.2 Row and column operations 

Given an mxn matrix A over a field K, we define certain operations on A called 
row and column operations. 

Definition 2.3 Elementary row operations There are three types: 

Type 1 Add a multiple of the jth row to the ith, where j ^ /. 
Type 2 Multiply the ith row by a non-zero scalar. 
Tyle 3 Interchange the ith and jth rows, where j ^ i. 

Elementary column operations There are three types: 

Type 1 Add a multiple of the jth column to the ith, where j ^ i. 
Type 2 Multiply the ith column by a non-zero scalar. 
Tyle 3 Interchange the ith and jth column, where j ^ i. 

By applying these operations, we can reduce any matrix to a particularly sim- 
ple form: 
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Theorem 2.1 Let A be an m x n matrix over the field K. Then it is possible to 
change A into B by elementary row and column operations, where B is a matrix 
of the same size satisfying Bu = 1 for <i <r, for r < min{m,n}, and all other 
entries ofB are zero. 

If A can be reduced to two matrices B and B' both of the above form, where 
the numbers of non-zero elements are r and / respectively, by different sequences 
of elementary operations, then r = r', and so B = B'. 

Definition 2.4 The number r in the above theorem is called the rank of A; while 
a matrix of the form described for B is said to be in the canonical form for equiv- 
alence. We can write the canonical form matrix in "block form" as 



where is an r x r identity matrix and O denotes a zero matrix of the appropriate 

size (that is, r x (n — r), {m — r)xr, and {m — r) x {n — r) respectively for the three 
0%). Note that some or all of these Os may be missing: for example, if r = m, we 
just have [/^ O]. 

Proof We outline the proof that the reduction is possible. To prove that we al- 
ways get the same value of r, we need a different argument. 

The proof is by induction on the size of the matrix A: in other words, we 
assume as inductive hypothesis that any smaller matrix can be reduced as in the 
theorem. Let the matrix A be given. We proceed in steps as follows: 

• \f A = O (the all-zero matrix), then the conclusion of the theorem holds, 
with r = 0; no reduction is required. So assume that A ^ O. 

• If All ^ 0, then skip this step. If An =0, then there is a non-zero element 
Aij somewhere in A; by swapping the first and /th rows, and the first and j\h 
columns, if necessary (Type 3 operations), we can bring this entry into the 
(1,1) position. 

• Now we can assume that An ^ 0. Multiplying the first row by A^^ , (row 
operation Type 2), we obtain a matrix with An = 1. 

• Now by row and column operations of Type 1, we can assume that all the 
other elements in the first row and column are zero. For if Ai^ 7^ 0, then 
subtracting Aij times the first column from the jih gives a matrix with Aiy = 
0. Repeat this until all non-zero elements have been removed. 
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Now let B be the matrix obtained by deleting the first row and column of A. 
Then B is smaller than A and so, by the inductive hypothesis, we can reduce 
B to canonical form by elementary row and column operations. The same 
sequence of operations applied to A now finish the job. 



Example 2.2 Here is a small example. Let 



A = 



2 
5 



We have An = 1, so we can skip the first three steps. Subtracting twice the first 
column from the second, and three times the first column from the third, gives the 
matrix 

"10 
4 -3 -6 

Now subtracting four times the first row from the second gives 



1 

-3 







From now on, we have to operate on the smaller matrix [ - 
tinue to apply the operations to the large matrix. 
Multiply the second row by — 1/3 to get 

1 0^ 
1 2 

Now subtract twice the second column from the third to obtain 



-6], but we con- 




1 



We have finished the reduction, and we conclude that the rank of the original 
matrix A is equal to 2. 

We finish this section by describing the elementary row and column operations 

in a different way. 

For each elementary row operation on an n-rowed matrix A, we define the cor- 
responding elementary matrix by applying the same operation to the n x n identity 
matrix /. Similarly we represent elementary column operations by elementary ma- 
trices obtained by applying the same operations to the m x m identity matrix. 

We don't have to distinguish between rows and columns for our elementary 
matrices. For example, the matrix 

1 2 0' 
1 
1 
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corresponds to the elementary column operation of adding twice the first column 
to the second, or to the elementary row operation of adding twice the second 
row to the first. For the other types, the matrices for row operations and column 
operations are identical. 

Lemma 2.2 The effect of an elementary row operation on a matrix is the same as 
that of multiplying on the left by the corresponding elementary matrix. Similarly, 
the effect of an elementary column operation is the same as that of multiplying on 
the right by the corresponding elementary matrix. 

The proof of this lemma is somewhat tedious calculation. 

Example 2.3 We continue our previous example. In order, here is the list of 
elementary matrices corresponding to the operations we applied to A. (Here 2x2 
matrices are row operations while 3x3 matrices are column operations). 



'1 


-2 


0" 




'1 


-3" 





1 










1 








1 







1 



" 1 0" 




"1 


-4 1 




-1/3 



1 











1 


-2 








1 



So the whole process can be written as a matrix equation: 



"1 




" 1 0" 


-1/3 




-4 1 





"1 


-2 


0" 




"1 





-3" 




'1 





" 


A 





1 










1 










1 


-2 










1 










1 










1 



= 5, 



or more simply 



where, as before. 



1 

4/3 





■1/3 



2 3 
5 6 



1 -2 1 



1 




= 5, 



B 



1 
1 



An important observation about the elementary operations is that each of them 
can have its effect undone by another elementary operation of the same kind, 
and hence every elementary matrix is invertible, with its inverse being another 
elementary matrix of the same kind. For example, the effect of adding twice the 
first row to the second is undone by adding —2 times the first row to the second, 
so that 



1 


2" 


-1 


'1 -2" 





1 




1 



Since the product of invertible matrices is invertible, we can state the above theo- 
rem in a more concise form. First, one more definition: 
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Definition 2.5 The mxn matrices A and B are said to be equivalent if B = PAQ, 
where P and Q are invertible matrices of sizes mxm and nxn respectively. 

Theorem 2.3 Given any mxn matrix A, there exist invertible matrices P and Q 
of sizes mxm and nxn respectively, such that PAQ is in the canonical form for 
equivalence. 

Remark The relation "equivalence" defined above is an equivalence relation on 
the set of all mxn matrices; that is, it is reflexive, symmetric and transitive. 

When mathematicians talk about a "canonical form" for an equivalence re- 
lation, they mean a set of objects which are representatives of the equivalence 
classes: that is, every object is equivalent to a unique object in the canonical form. 
We have shown this for the relation of equivalence defined earlier, except for the 
uniqueness of the canonical form. This is our job for the next section. 

2.3 Rank 

We have the unfinished business of showing that the rank of a matrix is well de- 
fined; that is, no matter how we do the row and column reduction, we end up with 
the same canonical form. We do this by defining two further kinds of rank, and 
proving that all three are the same. 

Definition 2.6 Let A be an m x n matrix over a field K. We say that the column 
rank of A is the maximum number of linearly independent columns of A, while 
the row rank of A is the maximum number of linearly independent rows of A. (We 
regard columns or rows as vectors in K™ and K" respectively.) 

Now we need a sequence of four lemmas. 

Lemma 2.4 (a) Elementary column operations don't change the column rank 
of a matrix. 

(b) Elementary row operations don't change the column rank of a matrix. 

(c) Elementary column operations don't change the row rank of a matrix. 

(d) Elementary row operations don't change the row rank of a matrix. 

Proof (a) This is clear for Type 3 operations, which just rearrange the vectors. 
For Types 1 and 2, we have to show that such an operation cannot take a linearly 
independent set to a linearly dependent set; the vice versa statement holds because 
the inverse of an elementary operation is another operation of the same kind. 
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So suppose that vi , . . . , v„ are linearly independent. Consider a Type 1 oper- 
ation involving adding c times the jth column to the ith; the new columns are 
Vj , . . . , v^, where = for k ^ i, while v- = v, + cvj. Suppose that the new vec- 
tors are linearly dependent. Then there are scalars ai, . . . ,a„, not all zero, such 
that 

= aiv[-\ h<3„v'i 

= aivi H h ai{vi + cvj) H h ajvj H h a„v„ 

= aiVi H h UiVi H + cai)v; H h a„v„. 

Since vi , . . . , v„ are linearly independent, we conclude that 

ai = 0, . . . , a, = 0, . . . , ay -|- ca; = 0, . . . , a„ = 0, 

from which we see that all the are zero, contrary to assumption. So the new 
columns are linearly independent. 

The argument for Type 2 operations is similar but easier. 

(b) It is easily checked that, if an elementary row operation is applied, then the 
new vectors satisfy exactly the same linear relations as the old ones (that is, the 
same linear combinations are zero). So the Unearly independent sets of vectors 
don't change at all. 

(c) Same as (b), but applied to rows. 

(d) Same as (a), but applied to rows. 

Theorem 2.5 For any matrix A, the row rank, the column rank, and the rank are 
all equal. In particular, the rank is independent of the row and column operations 
used to compute it. 

Proof Suppose that we reduce A to canonical form B by elementary operations, 
where B has rank r. These elementary operations don't change the row or column 
rank, by our lemma; so the row ranks of A and B are equal, and their column ranks 
are equal. But it is trivial to see that, if 



B = 



Ir O 

o o 



then the row and column ranks of B are both equal to r. So the theorem is proved. 

We can get an extra piece of information from our deliberations. Let A be an 
invertible nxn matrix. Then the canonical form of A is just /: its rank is equal 
to n. This means that there are matrices P and Q, each a product of elementary 
matrices, such that 

PAQ = In. 
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From this we deduce that 

A = p-^InQ-^ =P-^Q-^; 

in other words. 

Corollary 2.6 Every invertible square matrix is a product of elementary matrices. 

In fact, we learn a little bit more. We observed, when we defined elementary 
matrices, that they can represent either elementary column operations or elemen- 
tary row operations. So, when we have written A as a product of elementary 
matrices, we can choose to regard them as representing column operations, and 
we see that A can be obtained from the identity by applying elementary column 
operations. If we now apply the inverse operations in the other order, they will turn 
A into the identity (which is its canonical form). In other words, the following is 
true: 

Corollary 2.7 If A is an invertible nxn matrix, then A can be transformed into 
the identity matrix by elementary column operations alone (or by elementary row 
operations alone). 



2.4 Determinants 

The determinant is a function defined on square matrices; its value is a scalar. 
It has some very important properties: perhaps most important is the fact that a 
matrix is invertible if and only if its determinant is not equal to zero. 

We denote the determinant function by det, so that det(A) is the determinant 
of A. For a matrix written out as an array, the determinant is denoted by replacing 
the square brackets by vertical bars: 



det 



"1 


2" 




1 


2 


3 


4 




3 


4 



You have met determinants in earlier courses, and you know the formula for 
the determinant ofa2x2or3x3 matrix: 



a b 
c d 



~ ad — be, 



a b c 
d e f 
g h i 



= aei + bfg + cdh — afh — bdi — ceg. 



Our first job is to define the determinant for square matrices of any size. We do 
this in an "axiomatic" manner: 
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Definition 2.7 A function D defined onnxn matrices is a determinant if it satis- 
fies the following three conditions: 

(Dl) For 1 < j < n, D is a linear function of the ith column: this means that, if A 
and A' are two matrices which agree everywhere except the zth column, and 
if A" is the matrix whose ith column is c times the ith column of A plus c' 
times the ith column of A', but agreeing with A and A' everywhere else, then 

DiA")=cDiA) + c'D{A'). 

(D2) If A has two equal columns, then D{A) = 0. 
(D3) D{In) = 1, where /„ is the n x n identity matrix. 

We show the following result: 

Tlieorem 2.8 There is a unique determinant function onnxn matrices, for any n. 

Proof First, we show that applying elementary row operations to A has a well- 
defined effect on D(A). 

(a) If B is obtained from A by adding c times the jth column to the ith, then 
D{B)^D{A). 

(b) If B is obtained from A by multiplying the ith column by a non-zero scalar 
c,thenD{B) = cD{A). 

(c) If B is obtained from A by interchanging two columns, then D{B) = —D{A). 

For (a), let A' be the matrix which agrees with A in all columns except the ith, 
which is equal to the jth column of A. By rule (D2), D(A') = 0. By rule (Dl), 

D{B) = D{A) + cD{A') = D{A). 

Part (b) follows immediately from rule (D3). 

To prove part (c), we observe that we can interchange the ith and jth columns 
by the following sequence of operations: 

• add the ith column to the jth; 

• multiply the ith column by —1; 

• add the jth column to the ith; 

• subtract the ith column from the jth. 



24 



CHAPTER!. MATRICES AND DETERMINANTS 



In symbols, 

(Ci.Cj) (cuCj + Ci) {-Ci,Cj+Ci) {Cj,Cj + Ci) (Cy,C,-). 

The first, third and fourth steps don't change the value of D, while the second 
multiplies it by — 1 . 

Now we take the matrix A and apply elementary column operations to it, keep- 
ing track of the factors by which D gets multiplied according to rules (a)-(c). The 
overall effect is to multiply D(A) by a certain non-zero scalar c, depending on the 
operations. 

• If A is invertible, then we can reduce A to the identity, so that cD{A) = 
D{I) = 1, whence D{A) = c'K 

• If A is not invertible, then its column rank is less than n. So the columns of A 
are linearly dependent, and one column can be written as a linear combina- 
tion of the others. Applying axiom (Dl), we see that D{A) is a linear com- 
bination of values D{A'), where A' are matrices with two equal columns; so 
D{A') = for all such A', whence Z)(A) = 0. 

This proves that the determinant function, if it exists, is unique. We show its 
existence in the next section, by giving a couple of formulae for it. 

Given the uniqueness of the determinant function, we now denote it by det(A) 
instead of Z)(A). The proof of the theorem shows an important corollary: 

Corollary 2.9 A square matrix is invertible if and only //"det(A) ^ 0. 

Proof See the case division at the end of the proof of the theorem. 

One of the most important properties of the determinant is the following. 

Theorem 2.10 If A andB are nxn matrices over K, then det(Afi) = det(A) det(fi). 

Proof Suppose first that B is not invertible. Then det(5) = 0. Also, AS is not 
invertible. (For, suppose that (A5)~^ —X,sothatXAB — I. Then XA is the inverse 
of B.) So det(A5) = 0, and the theorem is true. 

In the other case, B is invertible, so we can apply a sequence of elementary 
column operations to B to get to the identity. The effect of these operations is 
to multiply the determinant by a non-zero factor c (depending on the operations), 
so that cdet(5) = /, or c = (det(5))^^. Now these operations are represented by 
elementary matrices; so we see that BQ — I, where 2 is a product of elementary 
matrices. 
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If we apply the same sequence of elementary operations to AB, we end up with 

the matrix {AB)Q = A{BQ) = AI = A. The determinant is multiplied by the same 
factor, so we find that cdet(A5) = det(A). Since c = det(5))~\ this implies that 
det(A5) = det(A)det(5), as required. 

Finally, we have defined determinants using columns, but we could have used 
rows instead: 

Proposition 2.11 The determinant is the unique function D of n x n matrices 
which satisfies the conditions 

(Dl') for \ <i<n, D is a linear function of the ith row; 
(D2') if two rows of A are equal , then D{A) — 0; 
(D3') D{In) = 1. 

The proof of uniqueness is almost identical to that for columns. To see that 

Z)(A) = det(A) : if A is not invertible, then D(A) = det(A) = 0; but if A is invertible, 
then it is a product of elementary matrices (which can represent either row or 
column operations), and the determinant is the product of the factors associated 
with these operations. 

Corollary 2.12 IfA^ denotes the transpose of A, then det(AT) = det(A). 

For, if D denotes the "determinant" computed by row operations, then det(A) = 
D{A) = deX.{A^), since row operations on A correspond to column operations on 
AT. 

2.5 Calculating determinants 

We now give a couple of formulae for the determinant. This finishes the job we 
left open in the proof of the last theorem, namely, showing that a determinant 
function actually exists! 

The first formula involves some background notation. 

Definition 2.8 A permutation of { 1 , . . . , n} is a bijection from the set { 1 , . . . , n} 
to itself. The symmetric group Sn consists of all permutations of the set { 1 , . . . , n}. 

(There are nl such permutations.) For any permutation n E Sn, there is a number 
sign(;r) = ±1, computed as follows: write ;r as a product of disjoint cycles; if 
there are k cycles (including cycles of length 1), then sign(;r) = (—1)"^^. A 
transposition is a permutation which interchanges two symbols and leaves all the 
others fixed. Thus, if T is a transposition, then sign(T) = — 1. 
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The last fact holds because a transposition has one cycle of size 2 and n — 2 
cycles of size 1, so n — 1 altogether; so sign(T) = (— = — 1. 

We need one more fact about signs: if n is any permutation and T is a trans- 
position, then sign(;rT) = — sign(;r), where KZ denotes the composition of K and 
T (apply first T, then n). 

Definition 2.9 Let A be an n x n matrix over K. The determinant of A is defined 
by the formula 

det(A)= sign(;r)Ai^(i)A2;r(2)---A„^(„). 

7C€S„ 

Proof In order to show that this is a good definition, we need to verify that it 
satisfies our three rules (D1)-(D3). 

(Dl) According to the definition, det(A) is a sum of n\ terms. Each term, apart 
from a sign, is the product of n elements, one from each row and column. If 

we look at a particular column, say the ith, it is clear that each product is a 
linear function of that column; so the same is true for the determinant. 

(D2) Suppose that the zth and jth columns of A are equal. Let T be the transpo- 
sition which interchanges / and j and leaves the other symbols fixed. Then 
7t{T{i)) = K{j) and;r(T(j)) = n{i), whereas K{T{k)) = n{k) ioxk^ij. Be- 
cause the elements in the /th and jih. columns of A are the same, we see that 
the products A^^^^)A2^^2) ■ ■ ■Kn{n) and A^^^^y^A^^xi^i) ■ ■ -Kmin) are equal. 
But sign(7rT) = — sign(;r). So the corresponding terms in the formula for 
the determinant cancel one another. The elements of S„ can be divided up 
into n!/2 pairs of the form {%, %x). As we have seen, each pair of terms in 
the formula cancel out. We conclude that det(A) = 0. Thus (D2) holds. 

(D3) If A = In, then the only permutation % which contributes to the sum is the 
identity permutation i: for any other permutation % satisfies %{}) ^ i for 
some /, so that A,-;j.(j) = 0. The sign of i is -|-1, and all the terms A,j(^,-) — An 
are equal to 1; so det(A) = 1, as required. 

This gives us a nice mathematical formula for the determinant of a matrix. 
Unfortunately, it is a terrible formula in practice, since it involves working out 
n\ terms, each a product of matrix entries, and adding them up with -|- and — 
signs. For n of moderate size, this will take a very long time! (For example, 
10! = 3628800.) 

Here is a second formula, which is also theoretically important but very inef- 
ficient in practice. 
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Definition 2.10 Let A be an n x n matrix. For 1 < /, j < n, we define the (/, 7) 

minor of A to be the (n — 1) x (n — l) matrix obtained by deleting the zth row and 
7th column of A. Now we define the (z, j) cofactor of A to be ( — times the 
determinant of the (z, j) minor. (These signs have a chessboard pattern, starting 
with sign + in the top left comer.) We denote the (z, j) cofactor of A by Kij{A). 
Finally, the adjugate of A is the n x n matrix Adj (A) whose (/, 7) entry is the (7, i) 
cofactor (A) of A. (Note the transposition!) 

Tlieorem 2.13 (a) For j <i< n, we have 

det(A) = t^ijKijiA). 
i=l 

(b) For I <i <n, we have 

dct{A) = j^AijKij{A). 

7=1 

This theorem says that, if we take any column or row of A, multiply each 
element by the corresponding cofactor, and add the results, we get the determinant 
of A. 

Example 2.4 Using a cofactor expansion along the first column, we see that 



1 2 3 
4 5 6 
7 8 10 



5 


6 




2 


3 


+ 7 


2 


3 


8 


10 


-4 


8 


10 


5 


6 



= (5 •10-6-8) -4(2- 10-3 -8) + 7(2-6- 3 -5) 
= 2+16-21 
= -3 



using the standard formula for a 2 x 2 determinant. 



Proof We prove (a); the proof for (b) is a simple modification, using rows instead 
of columns. Let D{A) be the function defined by the right-hand side of (a) in the 
theorem, using the 7th column of A. We verify rules (D1)-(D3). 



(Dl) It is clear that D{A) is a linear function of the 7th column. For k 7^ 7, the co- 
factors are linear functions of the ^th column (since they are determinants), 
and so D(A) is linear. 
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(D2) If the Ml and /th columns of A are equal, then each cofactor is the determi- 
nant of a matrix with two equal columns, and so is zero. The harder case is 
when the jth column is equal to another, say the kth. Using induction, each 
cofactor can be expressed as a sum of elements of the A;th column times 
(n — 2) X (n — 2) determinants. In the resulting sum, it is easy to see that 
each such determinant occurs twice with opposite signs and multipUed by 
the same factor. So the terms all cancel. 

(D3) Suppose that A = /. The only non-zero cofactor in the jth column is Kjj{I), 
which is equal to (-l)-'+-'det(/„_i) = 1. So D{I) = 1. 

By the main theorem, the expression D(A) is equal to det(A). 

At first sight, this looks like a simple formula for the determinant, since it is 
just the sum of n terms, rather than nl as in the first case. But each term is an 
(n — 1) X (n — 1) determinant. Working down the chain we find that this method 
is just as labour-intensive as the other one. 

But the cofactor expansion has further nice properties: 

Theorem 2.14 For any nxn matrix A, we have 

A ■ Adj (A) = Adj (A) • A = det(A) • I. 

Proof We calculate the matrix product. Recall that the entry of Adj (A) is 

KjiiA). 

Now the (j, i) entry of the product A • Adj (A) is 

£ A,,(Adj(A))fo- = j^AMA) = det(A), 
k=i k=i 

by the cofactor expansion. On the other hand, if / ^ j, then the entry of the 
product is 

Y,AikiAdjiA))kj=Y,AikKjk{A). 

k=i k=l 

This last expression is the cofactor expansion of the matrix A' which is the same 
of A except for the jth row, which has been replaced by the ith row of A. (Note 
that changing the jth row of a matrix has no effect on the cofactors of elements in 
this row.) So the sum is det(A'). But A' has two equal rows, so its determinant is 
zero. 

Thus A ■ Adj (A) has entries det(A) on the diagonal and everywhere else; so 
it is equal to det(A) •/. 

The proof for the product the other way around is the same, using columns 
instead of rows. 
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Corollary 2.15 If the nxn matrix A is invertible, then its inverse is equal to 

(det(A))-iAdj(A). 

So how can you work out a determinant efficiently? The best method in prac- 
tice is to use elementary operations. 

Apply elementary operations to the matrix, keeping track of the factor by 
which the determinant is multiplied by each operation. If you want, you can 
reduce all the way to the identity, and then use the fact that det(7) = 1. Often it is 
simpler to stop at an earlier stage when you can recognise what the determinant is. 
For example, if the matrix A has diagonal entries ai, . . . ,a„, and all off-diagonal 
entries are zero, then det(A) is just the product a\ ■■■an. 



Example 2.5 Let 



A = 



1 2 3 
4 5 6 
7 8 10 



Subtracting twice the first column from the second, and three times the second 
column from the third (these operations don't change the determinant) gives 

10 
4 -3 -6 
7 -6 -11 

Now the cofactor expansion along the first row gives 



det(A) 



-3 -6 
-6 -11 



33-36= -3. 



(At the last step, it is easiest to use the formula for the determinant of a 2 x 2 
matrix rather than do any further reduction.) 



2.6 The Cayley-Hamilton Theorem 

Since we can add and multiply matrices, we can substitute them into a polynomial. 
For example, if 

' r 

-2 3_ 

then the result of substituting A into the polynomial — 3x + 2 is 



A^-3A + 27: 



A = 



"-2 


3" 


+ 


"0 


-3" 


+ 


"2 


0" 




'0 


0" 


-6 


7 


6 


-9 





2 
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We say that the matrix A satisfies the equation — 3x + 2 = 0. (Notice that for 
the constant term 2 we substituted 21.) 

It turns out that, for every nxn matrix A, we can calculate a polynomial equa- 
tion of degree n satisfied by A. 

Definition 2.11 Let A be a n x n matrix. The characteristic polynomial of A is 
the polynomial 

ca{x) = det(jc/-A). 
This is a polynomial in x of degree n. 

For example, if 



A = 



then 



ca{x) 



X -1 
2 x-3 



1 

-2 3 



:x(x-3) + 2 = x^-3x + 2. 



Indeed, it turns out that this is the polynomial we want in general: 

Theorem 2.16 (Cayley-Hamilton Theorem) Let A beannxn matrix with char- 
acteristic polynomial ca{x). Then ca{A) — O. 



Example 2.6 Let us just check the theorem for 2 x 2 matrices. If 

A 

then 



a b 
c d 



ca{x) 



(I —b 

—c x — d 



= x^ — {a-\-d)x-\- {ad — be), 



and so 

ca{A) = 



a^ + bc ab + bd 
ac + cd bc + d^ 



{a + d) 



a b 
c d 



+ {ad — be) 



1 
1 



= 0, 



after a small amount of calculation. 
Proof We use the theorem 

A-Adj(A) =det(A)-/. 
In place of A, we put the matrix xI — A into this formula: 

{xI-A) Adj {xI-A) = det(x/ - A)I = ca {x)I. 
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Now it is very tempting just to substitute x = A into this formula: on the 
right we have ca(A)/ = ca(A), while on the left there is a factor AI — A = O. 
Unfortunately this is not valid; it is important to see why. The matrix Adj(x/ — A) 
is an n X n matrix whose entries are determinants of — 1) x (n — 1) matrices 
with entries involving x. So the entries of Adj {xl —A) are polynomials in x, and if 
we try to substitute A for x the size of the matrix will be changed! 

Instead, we argue as follows. As we have said, Adj(x/ — A) is a matrix whose 
entries are polynomials, so we can write it as a sum of powers of x times matrices, 
that is, as a polynomial whose coefficients are matrices. For example, 



x^+l 2x 
3jc-4 x + 2 





'1 


0" 


+ x 


'0 


2" 


+ 


" 1 0" 


= x^ 








3 


1 


-4 2 



The entries in Adj(x/ — A) are {n — \ ) x (n — I) determinants, so the highest 
power of X that can arise is x"^^. So we can write 

Adj {xl - A) = jc"- 15„_ 1 + + • • • + xBi + Bo, 

for suitable nxn matrices 5o, . . . ,Bn-i. Hence 

ca{x)I = {xI-A)Adj{xI-A) 

= {xl -A){x"-^Bn-i +x"-^Bn-2 + ---+xBi+Bo) 

= x^Bn-i +x^-\-ABn-\ +Bn-2) + ■■■ +x{-ABi +Bo) -ABq. 

So, if we let 

CAix) =/' + C„_lJc""^ H f-ClJC + Co, 

then we read off that 

Bn 1 = 1: 
—ABn-\ + Bn-2 — C„_i/, 

-ABx + 5o = ci/, 

-ABq = CqI. 

We take this system of equations, and multiply the first by A", the second by 
A" \ . . . , and the last by A^ = I. What happens? On the left, all the terms cancel 

in pairs: we have 

A«5„_i +A«-i(-A5„_i +5„_2) + • • • +A{-ABi +Bo) +I{-ABo) = O. 
On the right, we have 

A" + c„_iA«~i + • • • + ciA + co/ = ca(A). 
So ca{A) — O, as claimed. 
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Chapter 3 

Linear maps between vector spaces 



We return to the setting of vector spaces in order to define linear maps between 
them. We will see that these maps can be represented by matrices, decide when 
two matrices represent the same linear map, and give another proof of the canon- 
ical form for equivalence. 

3.1 Definition and basic properties 

Definition 3.1 Let V and W be vector spaces over a field K. A function a from 
V to V7 is a linear map if it preserves addition and scalar multiplication, that is, if 

• a(vi + V2) = a(vi) + a(v2) for all vi,V2 e V; 

• a{cv) = ca{v) for all v e V and c eK. 

Remarks 1. We can combine the two conditions into one as follows: 

a(civi +C2V2) = cia(vi)+c2a(v2). 

2. In other literature the term "linear transformation" is often used instead of 
"linear map". 

Definition 3.2 Let oc : V — > V7 be a linear map. The image of a is the set 

Im(a) = {w eW -.w = a(v) for some v e V}, 
and the kernel of a is 

Ker(a) = {v eV : a{v) = 0}. 
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Proposition 3.1 Let a :V be a linear map. Then the image of a is a sub- 
space ofW and the kernel is a subspace ofV. 

Proof We have to show that each is closed under addition and scalar multiplica- 
tion. For the image, if wi = a(vi) and W2 = oc{v2), then 

WI+W2 = a(vi) + a(v2) = a(vi +V2), 
and if w = a(v) then 

cw = ca{v) = a{cv). 
For the kernel, if a(vi) = a(v2) = then 

a(vi+v2) = a(vi) + a(v2) =0 + = 0, 

andif a(v) = Othen 

a{cv) = ca{v) = cO = 0. 

Definition 3.3 We define the rank of a to be p(a) = dim(Im(a)) and the nullity 
of a to be v(a) = dim(Ker(a)). (We use the Greek letters 'rho' and 'nu' here to 
avoid confusing the rank of a linear map with the rank of a matrix, though they 
will turn out to be closely related!) 

Theorem 3.2 (Rank-Nullity Theorem) Let a :V be a linear map. Then 
p (a) + v(oc) = dim(V). 

Proof Choose a basis mi,M2, • • • for Ker(a), where r = dim(Ker(a)) = v(a). 
The vectors ui,.. .,Uq are linearly independent vectors of V, so we can add further 
vectors to get a basis for V, say mi , . . . , m^, vi , . . . , v^, where q + s = dim(y). 

We claim that the vectors oc(vi), . . . , (x{vs) form a basis for Im(a). We have 
to show that they are linearly independent and spanning. 

Linearly independent: Suppose that cia(vi) H \-CsOc{vs) = 0. Then oc(civi + 

h CsVg) = 0, so that civi -\ h c^v^ G Ker(a). But then this vector can 

be expressed in terms of the basis for Ker(a): 

ClVi H \-CsVs = aiui H \-aqUq, 

whence 

—aiU\ QqUq+ClVl ^ VCgVs = 0. 

But the MS and vs form a basis for V , so they are linearly independent. So 
this equation implies that all the as and cs are zero. The fact that ci = • • • = 
Cj = shows that the vectors a(vi, . . . , ci{vs) are linearly independent. 
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Spanning: Take any vector in Im(a), say w. Then w = a{v) for some v &V. 
Write V in terms of the basis for V: 

V = aiMl H \-aqUq + ClVl H \-CsVs 

for some ai , . . . , a^, ci , . . . , Cj. Applying a, we get 
w = a{v) 

= aia{ui) H \-aqa{uq) + cia(vi) H \-Csa{vs) 

= CiWiH \-CsWs, 

since = (as G Ker(oc)) and oc(v/) = Wi. So the vectors wi, . . . 
span Im(a). 

Thus, p(a) = dim(Im(oc)) = s. Since v(a) = q and ^ + 5 = dim(V), the 
theorem is proved. 

3.2 Representation by matrices 

We come now to the second role of matrices in Unear algebra: they represent 
linear maps between vector spaces. 

Let a : y — i> V7 be a linear map, where dim(V) = m and dim(W) = n. As we 
saw in the first section, we can take V and W in their coordinate representation: 
V = K"^ and W = K" (the elements of these vector spaces being represented as 
column vectors). Let ei , . . . , be the standard basis for V (so that is the vector 
with ith coordinate 1 and all other coordinates zero), and /i, . . . ,/„ the standard 
basis for V. Then for / = 1, . . . , m, the vector a(e/) belongs to W, so we can write 
it as a linear combination of /i , . . . , /„. 

Definition 3.4 The matrix representing the linear map a : V — > W relative to the 
bases B = (ei, . . . , e^) for V and C = (/i, . . . ,/n) for W is the n x m matrix whose 
(j, j) entry is aij, where 

n 

a{ei) = £ ajifj 

for; = l,...,n. 

In practice this means the following. Take a(e/) and write it as a column vector 
[au a2i ■■■ (2„/]^. This vector is the rth column of the matrix representing a. 
So, for example, if m = 3, n = 2, and 



oc{ei) = fi +/2, a{e2) = 2/1+5/2, a{e3) = 3/i -/a, 
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then the vectors a(e;) as column vectors are 



a{ei) 



3 

-1 



and so the matrix representing T is 



1 2 3 
1 5 -1 



Now the most important thing about this representation is that the action of a 
is now easily described: 

Proposition 3.3 Let a :V be a linear map. Choose bases for V and W and 

let A be the matrix representing 05. Then, if we represent vectors ofV and W as 
column vectors relative to these bases, we have 

a{v) =Av. 

Proof Let , . . . , be the basis for V, and /i ,...,/„ for W . Take v = YliLi CiCi e 
V, so that in coordinates 



Then 



«(^) = = E E Ciajifj, 

;=lj=l 



i=l 



SO the jth coordinate of Oc(v) is ^4=1 ^jiCu which is precisely the 7th coordinate in 
the matrix product Av. 

In our example, if v = 2^1 + 3^2 + 4^3 = [2 3 4]^, then 
a(v) =Av : 





2 






1 2 3 








"20" 


1 5 -1 




3 




13 






4 





Addition and multiplication of Unear maps correspond to addition and multi- 
plication of the matrices representing them. 

Definition 3.5 Let a and j8 be linear maps from V toW. Define their sum a + j8 
by the rule 

(a + j8)(v) = a(v) + j8(v) 
for all V e y. It is easy to check that a + j8 is a Unear map. 



3.3. CHANGE OF BASIS 
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Proposition 3.4 If a and j8 are linear maps represented by matrices A and B 
respectively, then a + j8 is represented by the matrix A +B. 

The proof of this is not difficult: just use the definitions. 

Definition 3.6 Let U^V^W be vector spaces over K, and let a : ?7 — > V and j8 : 
V he linear maps. The product j8a is the function U defined by the 
rule 

for all M G [/. Again it is easily checked that j8a is a linear map. Note that the 
order is important: we take a vector u eU, apply a to it to get a vector in V, and 
then apply j8 to get a vector in W. So j8oc means "apply a, then j8". 

Proposition 3.5 If a : U ^ V and j8 : V — > W are linear maps represented by 
matrices A and B respectively, then j8 a is represented by the matrix BA. 

Again the proof is tedious but not difficult. Of course it follows that a linear 
map is invertible (as a map; that is, there is an inverse map) if and only if it is 
represented by an invertible matrix. 

Remark Let / = dim{U) , m = dim(y ) and n = dim(W) , then A is m x /, and B 
is n X m; so the product BA is defined, and is n x /, which is the right size for a 
matrix representing a map from an /-dimensional to an n-dimensional space. 

The significance of all this is that the strange rule for multiplying matrices is 
chosen so as to make Proposition 3.5 hold. The definition of multiplication of 
linear maps is the natural one (composition), and we could then say: what defini- 
tion of matrix multiplication should we choose to make the Proposition valid? We 
would find that the usual definition was forced upon us. 

3.3 Change of basis 

The matrix representing a linear map depends on the choice of bases we used to 
represent it. Now we have to discuss what happens if we change the basis. 

Remember the notion of transition matrix from Chapter 1. If 5 = (vi , . . . , v^) 
and B' = {v[, . . . , v'„) are two bases for a vector space V, the transition matrix Pg ^ 
is the matrix whose jth column is the coordinate representation of v'j in the basis 
B. Then we have 

[v]b = P[v]b', 
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where [v]b is the coordinate representation of an arbitrary vector in the basis B, 
and similarly for B'. The inverse of is Pb'^b- Let pij be the entry of 

Now let C = (wi , . . . , Wn) and C' — {w[ , . . . , w^) be two bases for a space W, 
with transition matrix 2c,c' and inverse Qc,c- Let 2 = Qc,c' and let = Qc',c be 
its inverse, with (/, 7) entry r,y. 

Let a be a linear map from V to W. Then a is represented by a matrix A 
using the bases B and C, and by a matrix A' using the bases B' and C'. What is the 
relation between A and A'? 

We just do it and see. To get A', we have to represent the vectors oc{v'i) in the 
basis C'. We have 

m 
i=l 

SO 

m 

m m 

= L L piAkiWk 

i=lk=l 
m n n 

= L L LP^•A^•^/^^^ 
^•=U=l/=l 

This means, on turning things around, that 

n m 
k=\i=\ 

SO, according to the rules of matrix multiplication, 

A' ^RAP^Or^AP. 

Proposition 3.6 Let a : V ^ W be a linear map represented by matrix A relative 
to the bases Bfor V and C for W, and by the matrix A' relative to the bases B' for 
V and C for W.IfP — Pb^b' ^nd Q = Pc^c <^^^ the transition matrices from the 
unprimed to the primed bases, then 

A' = Q-^AP. 

This is rather technical; you need it for explicit calculations, but for theoretical 
purposes the importance is the following corollary. Recall that two matrices A and 
B are equivalent if B is obtained from A by multiplying on the left and right by 
invertible matrices. (It makes no difference that we said B — PAQ before and 
B = Q~^AP here, of course.) 
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Proposition 3.7 Two matrices represent the same linear map with respect to dif- 
ferent bases if and only if they are equivalent. 

This holds because 

• transition matrices are always invertible (the inverse of Pb^b' is the matrix 
Pb',b foi" the transition in the other direction); and 

• any invertible matrix can be regarded as a transition matrix: for, if the nx n 
matrix P is invertible, then its rank is n, so its columns are linearly inde- 
pendent, and form a basis B' for K"; and then P = P^^', where B is the 
"standard basis". 

3.4 Canonical form revisited 

Now we can give a simpler proof of Theorem 2.3 about canonical form for equiv- 
alence. First, we make the following observation. 

Theorem 3.8 Let a :V be a linear map of rank r = p{a). Then there are 
bases for V and W such that the matrix representing a is, in block form, 



Proof As in the proof of Theorem 3.2, choose a basis mi , . . . , for Ker(a), and 
extend to a basis mi, . . . ,Mi,vi, . . . , for V. Then a(vi), . . . , a(vr) is a basis for 
Im(a), and so can be extended to a basis oc(vi), . . . , a(vr),xi, . . . for W. Now 
we will use the bases 

Vi,...,Vr,Vr+l=Ui,...,Vr+s = Ws for V, 
Wl = a{vi),...,Wr= a{Vr),Wr+l=Xl,...,Wr+s=Xs for W. 



Ir O 
O O 



We have 




Wi if 1 < ? < r, 
otherwise; 



so the matrix of a relative to these bases is 



Ir O 
O O 



as claimed. 
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We recognise the matrix in the theorem as the canonical form for equivalence. 
Combining Theorem 3.8 with Proposition 3.7, we see: 

Theorem 3.9 A matrix of rank r is equivalent to the matrix 

Ir O 

o o ■ 

We also see, by the way, that the rank of a linear map (that is, the dimension 
of its image) is equal to the rank of any matrix which represents it. So all our 
definitions of rank agree! 

The conclusion is that 

two matrices are equivalent if and only if they have the same rank. 

So how many equivalence classes of mxn matrices are there, for given m and n? 
The rank of such a matrix can take any value from up to the minimum of m and 
n; so the number of equivalence classes is min{m,n} + 1. 



Chapter 4 

Linear maps on a vector space 



In this chapter we consider a linear map a from a vector space V to itself. If 
dim(y) = n then, as in the last chapter, we can represent a by an n x n matrix 
relative to any basis for V . However, this time we have less freedom: instead of 
having two bases to choose, there is only one. This makes the theory much more 
interesting! 

4.1 Projections and direct sums 

We begin by looking at a particular type of linear map whose importance will be 
clear later on. 

Definition 4.1 The linear map ;r : V — > V is a projection if %^ = % (where, as 
usual, 7t^ is defined by 7t^{v) = ;r(;r(v))). 

Proposition 4.1 Ifii -.V isa projection, then V = lm(;r) © Ker(;r). 

Proof We have two things to do: 

Im(;r) + Ker(;r) = V: Take any vector v e V, and let w = 7t{v) e Im(;r). We 
claim that v — wE Ker(;r). This holds because 

n{v -w) = n{v) - n{w) = n{v) - n{n{v)) = n(v) — n^iv) = 0, 

since = n. Now v = w + (v — w) is the sum of a vector in lm(;r) and one 
in Ker(;r). 

Im(;r) nKer(;r) = {0}: Take v e Im(;r) nKer(;r). Then v = 7u{w) for some vector 
w; and 

= ;r(v) = n{n{w)) = n^{w) = n{w) = v, 
as required (the first equality holding because v e Ker(;r)). 
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It goes the other way too: ifV = U®W, then there is a projection 7t:V 
with Im(;r) = U and Ker(;r) = W. For every vector v eV can be uniquely written 
as V = M + w, where ueU and w e W; we define 7C by the rule that 7i{v) = u. Now 
the assertions are clear. 

The diagram in Figure 4.1 shows geometrically what a projection is. It moves 
any vector v in a direction parallel to Ker(;r) to a vector lying in Im(;r). 




Figure 4.1: A projection 

We can extend this to direct sums with more than two terms. First, notice that 
if TT is a projection and k' = I — K (where / is the identity map, satisfying /(v) = v 
for all vectors v), then n' is also a projection, since 

[Tt'f = (I - jif = I - 2n + = I - 2n + n = I - n = Ti'; 

and n + 7c' =1; also nn' ~ n{I — n) — n — — O. Finally, we see that Ker(;r) = 
Im(;r'); so V = Im(;r) ©Im(;r'). In this form the result extends: 

Proposition 4.2 Suppose that ni^n2i - ■ ■ iitr are projections on V satisfying 

(a) Tt\-\-Tt2-\ \-Ttr = I, where I is the identity transformation; 

(b) 7tinj = Ofori^ j. 

Then V ^ Ui® U2® ■ ■ ■ ®Ur, where Ui = Im(;r/). 

Proof We have to show that any vector v can be uniquely written in the form 
V = Ml + M2 H h Mr, where m; e for i = 1 , . . . , r. We have 

V = /(v) = TZ\ (v) -\-Tt2{v)-\ h Ttr{v) = Ml + M2 H h M^, 
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where = ni{v) e Im(;r;) for / = 1, . . . ,r. So any vector can be written in this 
form. Now suppose that we have any expression 

V = Ui +U2-\ hUf, 

with u'j eUi for i= I,..., r. Since «• e Ui — Jm{Ki), we have m- = K{vi) for some 
v, ; then 

On the other hand, for j ^ i, we have 

miu'j) = TtiTlj{vj) = 0, 

since TtjTtj = O. So applying ;r,- to the expression for v, we obtain 

7ri(v) = 7ti{u[) + 7ti{u2) H h ;r/(M'^) = 7ti{Ui) = u'i, 

since all terms in the sum except the ith are zero. So the only possible expression 
is given by = Ki{v), and the proof is complete. 

Conversely, if V = Ui(BU2®- ■ -(BUr, then we can find projections 7ti,7t2-i ■ ■ ■ iTtr 
satisfying the conditions of the above Proposition. For any vector v e V has a 
unique expression as 

V = Ml +M2 H 

with Ui e i/; for « = 1, . . . , r; then we define Ki{v) = M(. 

The point of this is that projections give us another way to recognise and de- 
scribe direct sums. 

4.2 Linear maps and matrices 

Let Ct : V — > V be a linear map. If we choose a basis vi , . . . , v„ for V, then V can 
be written in coordinates as K", and a is represented by a matrix A, say, where 

n 

Then just as in the last section, the action of a on V is represented by the action of 
A on K": a(v) is represented by the product Av. Also, as in the last chapter, sums 
and products (and hence arbitrary polynomials) of linear maps are represented by 
sums and products of the representing matrices: that is, for any polynomial f{x), 
the map f{(x) is represented by the matrix /(A). 

What happens if we change the basis? This also follows from the formula we 
worked out in the last chapter. However, there is only one basis to change. 
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Proposition 4.3 Let a be a linear map on V which is represented by the matrix A 

relative to a basis B, and by the matrix A' relative to a basis B'. Let P = Pggi be 
the transition matrix between the two bases. Then 

A' = P-^AP. 

Proof This is just Proposition 4.6, since P and Q are the same here. 

Definition 4.2 Two nxn matrices A and B are said to be similar if 5 = P^^AP 
for some invertible matrix P. 

Thus similarity is an equivalence relation, and 

two matrices are similar if and only if they represent the same linear 
map with respect to different bases. 

There is no simple canonical form for similarity like the one for equivalence 
that we met earlier. For the rest of this section we look at a special class of ma- 
trices or linear maps, the "diagonalisable" ones, where we do have a nice simple 
representative of the similarity class. In the final section we give without proof a 
general result for the complex numbers. 

4.3 Eigenvalues and eigenvectors 

Definition 4.3 Let a be a linear map onV. A vector v G V is said to be an 
eigenvector of a, with eigenvalue A e K, if v and a(v) = A v. The set {v : 
a(v) = Xv} consisting of the zero vector and the eigenvectors with eigenvalue A 
is called the X-eigenspace of a. 

Note that we require that v 7^ 0; otherwise the zero vector would be an eigen- 
vector for any value of A. With this requirement, each eigenvector has a unique 
eigenvalue: for if a{v) = Av = jiv, then (A — ju)v = 0, and so (since v 7^ 0) we 
have A = jU. 

The name eigenvalue is a mixture of German and English; it means "charac- 
teristic value" or "proper value" (here "proper" is used in the sense of "property"). 
Another term used in older books is "latent root". Here "latent" means "hidden": 
the idea is that the eigenvalue is somehow hidden in a matrix representing a, and 
we have to extract it by some procedure. We'll see how to do this soon. 



4.4. DIAGONALISABILITY 



45 



Example Let 



The vector v = 



satisfies 



-6 6 
-12 11 



" -6 


6 " 




"3" 




"3" 


-12 


11 




4 


= 2 


4 



IS an eigen- 



so is an eigenvector with eigenvalue 2. Similarly, the vector w = 

vector with eigenvalue 3. 

If we knew that, for example, 2 is an eigenvalue of A, then we could find a 

by solving the linear equations 



corresponding eigenvector 





X 




X 






= 2 






y. 




y_ 



-6 6 
■12 11 



In the next-but-one section, we will see how to find the eigenvalues, and the fact 
that there cannot be more than n of them for annxn matrix. 



4.4 Diagonalisability 

Some linear maps have a particularly simple representation by matrices. 

Definition 4.4 The linear map a on V is diagonalisable if there is a basis of V 
relative to which the matrix representing a is a diagonal matrix. 

Suppose that vi, . . . , v„ is such a basis showing that a is diagonalisable. Then 
a{vi) — auVi for i — 1, . . . ,n, where an is the ith diagonal entry of the diagonal 
matrix A. Thus, the basis vectors are eigenvectors. Conversely, if we have a basis 
of eigenvectors, then the matrix representing a is diagonal. So: 



Proposition 4.4 The linear map a onV is diagonalisable if and only if there is a 
basis ofV consisting of eigenvectors of a. 
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Example The matrix 



1 2 
1 



is not diagonalisable. It is easy to see that its only 

T 



eigenvalue is 1, and the only eigenvectors are scalar multiples of [ 1 0] . So we 
cannot find a basis of eigenvectors. 

Theorem 4.5 Let a :V be a linear map. Then the following are equivalent: 

(a) a is diagonalisable; 

(b) V is the direct sum of eigenspaces of a; 

(c) CC = XiTti -\ h Xi-Kf, where Ai , . . . , are the distinct eigenvalues of a, 

and Til,... ,nr are projections satisfying TliA \-Tir — l ^nd TliTlj = Ofor 



Proof Let Ai, . . . , be the distinct eigenvalues of a, and let v/i, . . . ,v/,„^. be a 
basis for the A,-eigenspace of a. Then a is diagonalisable if and only if the union 
of these bases is a basis for V. So (a) and (b) are equivalent. 

Now suppose that (b) holds. Proposition 4.2 and its converse show that there 
are projections ;ri, . . . , TT^ satisfying the conditions of (c) where Im(;r;) is the A,- 
eigenspace. Now in this case it is easily checked that T and J^^iKi agree on every 
vector in V, so they are equal. So (b) implies (c). 

Finally, if a = L^!'^/> where the 7ti satisfy the conditions of (c), then V is the 
direct sum of the spaces Im(;rj), and Im(;r;) is the A(-eigenspace. So (c) implies 
(b), and we are done. 



Example Our matrix A = 



-6 6 
■12 11 



is diagonalisable, since the eigenvectors 



"3" 


and 


"2" 






4 


3 


that 







are linearly independent, and so form a basis for M. Indeed, we see 



" -6 


6 " 




"3 


4" 




'3 


4" 




'2 


0" 


-12 


11 




2 


3_ 




2 


3_ 







3_ 



so that P~^AP is diagonal, where P is the matrix whose columns are the eigenvec- 
tors of A. 

Furthermore, one can find two projection matrices whose column spaces are 
the eigenspaces, namely 



Pi 



' 9 


-6" 


P2 = 


' -8 


6" 


12 


-8 


-12 


9 
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Check directly that Pf = Pi, = P2, P1P2 = P2P1 = 0, Pi +P2 = I, and 2Pi + 3P2 = 
A. 

This expression for a diagonalisable matrix A in terms of projections is useful 
in calculating powers of A, or polynomials in A. 

Proposition 4.6 Let 

A^Y.XiPi 

i=l 

be the expression for the diagonalisable matrix A in terms of projections Pi sat- 
isfying the conditions of Theorem 4.5, that is, Y,i=i Pi = ^ PiPj = Ofor i ^ j. 
Then 

(a) for any positive integer m, we have 

A^=j^xrPi-^ 

i=\ 

(b) for any polynomial f{x), we have 

f{A)^j^f{Xi)Pi. 



Proof (a) The proof is by induction on m, the case m—l being the given expres- 
sion. Suppose that the result holds for m = A; — 1. Then 

A^ = P^-^A 

When we multiply out this product, all the terms PiPj are zero for i ^ j, and we 
obtain simply ^4=1 ^i ^^^Pu as required. So the induction goes through. 

(b) If f{x) = Y.^^nix'", we obtain the result by multiplying the equation of part 
(a) by am and summing over m. (Note that, for m = 0, we use the fact that 

A' = I=tPi = t^?Pi, 

i=l (=1 

that is, part (a) holds also for m = 0.) 
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4.5 Characteristic and minimal polynomials 

We defined the determinant of a square matrix A. Now we want to define the de- 
terminant of a linear map a. The obvious way to do this is to take the determinant 
of any matrix representing a. For this to be a good definition, we need to show 
that it doesn't matter which matrix we take; in other words, that det(A') = det(A) 
if A and A' are similar. But, if A' = P~^AP, then 

det(p-Up) = det(P"^)det(A)det(P) = det(A), 

since det(P~^)det(P) = 1. So our plan will succeed: 

Definition 4.5 (a) The determinant det(a) of a linear map a : V — > V is the 
determinant of any matrix representing T. 

(b) The characteristic polynomial Ca {x) of a linear map a : V — V is the char- 
acteristic polynomial of any matrix representing a. 

(c) The minimal polynomial ma{x) of a linear map a : V — > V is the monic 
polynomial of smallest degree which is satisfied by a. 

The second part of the definition is OK, by the same reasoning as the first 
(since ca (x) is just a determinant). But the third part also creates a bit of a problem: 
how do we know that a satisfies any polynomial? The Cayley-Hamilton Theorem 
tells us that ca{A) = O for any matrix A representing a. Now ca(A) represents 
ca{oc), and ca = Ca by definition; so Ca{oc) ~ O. Indeed, the Cayley-Hamilton 
Theorem can be stated in the following form: 

Proposition 4.7 For any linear map a on V, its minimal polynomial ma{x) di- 
vides its characteristic polynomial c«(x) (as polynomials). 

Proof Suppose not; then we can divide Ca{x) by ma{x), getting a quotient ^(jc) 
and non-zero remainder r{x) \ that is, 

Ca {x) = ma {x)q{x) + r{x) . 

Substituting a for x, using the fact that Caioc) = ma{oL) = O, we find that r(a) = 
0. But the degree of r is less than the degree of m^, so this contradicts the defini- 
tion of ma as the polynomial of least degree satisfied by a. 

Tlieorem 4.8 Let a be a linear map on V. Then the following conditions are 
equivalent for an element X ofK: 

(a) X is an eigenvalue of a; 

(b) X is a root of the characteristic polynomial of a; 

(c) X is a root of the minimal polynomial of a. 
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Remark: This gives us a recipe to find the eigenvalues of a: take a matriK A 
representing a; write down its characteristic polynomial ca{x) = det(x/— A); and 
find the roots of this polynomial. In our earlier example, 

= (x-0.9)(x-0.7)-0.03=jc2-1.6jc+0.6 = (jc-1)(x-0.6), 

so the eigenvalues are 1 and 0.6, as we found. 

Proof (b) implies (a): Suppose that Ca(A) = 0, that is, det(A/— a) = 0. Then 
XI — a is not invertible, so its kernel is non-zero. Pick a non-zero vector v in 
Ker(A/ — a). Then {XI — a)v = 0, so that a(v) = Xv; that is, X is an eigenvalue 
of a. 

(c) implies (b): Suppose that A is a root of ma{x). Then {x — X) divides 
niaix). But ma{x) divides Ca{x), by the Cayley-Hamilton Theorem: so {x — X 
divides Ca{x), whence A is a root of Ca{x). 

(a) implies (c): Let X be an eigenvalue of A with eigenvector v. We have 
a(v) = Xv. By induction, a^{v) = X^v for any k, and so f{a){y) = f{X){v) 
for any polynomial /. Choosing / = nia, we have ma{oc) = by definition, so 
ma{X)v = 0; since v 0, we have ma{X) — 0, as required. 

Using this result, we can give a necessary and sufficient condition for a to be 
diagonalisable. First, a lemma. 

Lemma 4.9 Letvi, ... ,Vrbe eigenvectors of a with distinct eigenvalues Xi,.. .,Xr. 
Then vi,. ..,Vr are linearly independent. 

Proof Suppose that vi , . . . , are linearly dependent, so that there exists a linear 
relation 

C\V\ H VCrVr = 0, 

with coefficients c, not all zero. Some of these coefficients may be zero; choose a 
relation with the smallest number of non-zero coefficients. Suppose that ci ^0. 
(If ci = just re-number.) Now acting on the given relation with a, using the fact 
that a{vi) = XiVi, we get 

C\X\V\-\ VCrXrVr = 0. 

Subtracting Xi times the first equation from the second, we get 

C2 {X2 - Ai )V2 H 1- Cr{Xr - Xi )Vr = 0. 

Now this equation has fewer non-zero coefficients than the one we started with, 
which was assumed to have the smallest possible number. So the coefficients in 



x-0.9 -0.3 
-0.1 x-0.7 
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this equation must all be zero. That is, c;(A; — Ai) = 0, so c,- = (since A, ^ Ai), 
for / = 2, . . . , R. This doesn't leave much of the original equation, only civi = 0, 
from which we conclude that ci —0, contrary to our assumption. So the vectors 
must have been linearly independent. 

Theorem 4.10 The linear map a on V is diagonalisable if and only if its mini- 
mal polynomial is the product of distinct linear factors, that is, its roots all have 
multiplicity 1. 

Proof Suppose first that a is diagonalisable, with eigenvalues Ai, . . . , A^. Then 
there is a basis such that a is represented by a diagonal matrix D whose diagonal 
entries are the eigenvalues. Now for any polynomial /, f{a) is represented by 
f{D), a diagonal matrix whose diagonal entries are /(A,) for / = 1, . . . , r. Choose 

f{x) = {x-Xi)---{x-Xr). 

Then all the diagonal entries of f{D) are zero; so f{D) = 0. We claim that / is 
the minimal polynomial of a; clearly it has no repeated roots, so we will be done. 
We know that each A,- is a root of ma{x), so that f{x) divides ma{x); and we also 
know that f{(x) = 0, so that the degree of / cannot be smaller than that of ma- So 
the claim follows. 

Conversely, we have to show that if is a product of distinct linear factors 
then a is diagonalisable. This is a little argument with polynomials. Let f{x) = 
Yl{x — A,) be the minimal polynomial of a, with the roots A/ all distinct. Let 
hi{x) = f{x)/{x — A;). Then the polynomials hi,...,hr have no common factor 
except 1 ; for the only possible factors are (x — A,), but this fails to divide hi. Now 
the Euclidean algorithm shows that we can write the h.c.f. as a linear combination: 

r 

1 = Y^hi{x)ki{x). 

i=i 

Let Ui — Im(/?,(a). The vectors in Ui are eigenvectors of a with eigenvalue A,; 
for if M e Ui, say u = hi{a)v, then 

(a - Xil)ui = (a - Xil)hi{a){v) = f{a)v = 0, 

so that a{v) — A,(v). Moreover every vector can be written as a sum of vectors 
from the subspaces Ui. For, given v e V, we have 

r 

v = /v= J^/i/(a)(/r/(a)v), 
i=\ 

with hi{a){ki{a)v) e Im(/i,(a). The fact that the expression is unique follows 
from the lemma, since the eigenvectors are linearly independent. 
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So how, in practice, do we "diagonalise" a matriK A, that is, find an invertible 
matrix P such that P~^AP — Dis diagonal? We saw an example of this earlier. The 
matrix equation can be rewritten as AP = PD, from which we see that the columns 
of P are the eigenvectors of A. So the proceedure is: Find the eigenvalues of A, and 
find a basis of eigenvectors; then let P be the matrix which has the eigenvectors as 
columns, and D the diagonal matrix whose diagonal entries are the eigenvalues. 
Then p-^AP = D. 

How do we find the minimal polynomial of a matrix? We know that it divides 
the characteristic polynomial, and that every root of the characteristic polynomial 
is a root of the minimal polynomial; then it's trial and error. For example, if the 
characteristic polynomial is(x— l)^(x— 2)^, then the minimal polynomial must be 

one of (;c — l)(x — 2) (this would correspond to the matrix being diagonalisable), 
{x- lf{x-2), {x- l){x-2f, {x- l)2(x-2)2, {x- l){x-2y or {x- lfix-2)\ 
If we try them in this order, the first one to be satisfied by the matrix is the minimal 
polynomial. 



For example, the characteristic polynomial of A 



1 2 
1 

minimal polynomial is not (x—l) (since A /); so it is (jc — 1)^ 



is {x — 1)^; its 



4.6 Jordan form 

We finish this chapter by stating without proof a canonical form for matrices over 
the complex numbers under similarity. 

Definition 4.6 (a) A Jordan block J{n, A) is a matrix of the form 



A 


1 


• 


•• 





A 


1 • 


•• 








• 


•• A 



that is, it is an n X n matrix with A on the main diagonal, 1 in positions 
immediately above the main diagonal, and elsewhere. (We take 7(1, A) to 
be the 1 X 1 matrix [A].) 

(b) A matrix is in Jordan form if it can be written in block form with Jordan 
blocks on the diagonal and zeros elsewhere. 

Theorem 4.11 Over C, any matrix is similar to a matrix in Jordan form; that 
is, any linear map can be represented by a matrix in Jordan form relative to a 
suitable basis. Moreover, the Jordan form of a matrix or linear map is unique 
apart from putting the Jordan blocks in a dijferent order on the diagonal. 
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Remark A matrix over C is diagonalisable if and only if all the Jordan blocks 
in its Jordan form have size 1. 

Example Any 3x3 matrix over C is similar to one of 



A 

ji 
V 



A 1 

A 
ju 



for some A, jU, V e C (not necessarily distinct). 



A 


1 








A 


1 








A 



Example Consider the matrix 



with b^O. Its characteristic polyno- 



mial is — lax + {a^ + b^), so that the eigenvalues over C are a + hi and a — M. 
Thus A is diagonalisable, if we regard it as a matrix over the complex numbers. 
But over the real numbers, A has no eigenvalues and no eigenvectors; it is not 
diagonalisable, and cannot be put into Jordan form either. 

We see that there are two different "obstructions" to a matrix being diagonal- 
isable: 

(a) The roots of the characteristic polynomial don't lie in the field K. We can 
always get around this by working in a larger field (as above, enlarge the 
field from M to C). 

(b) Even though the characteristic polynomial factorises, there may be Jordan 
blocks of size bigger than 1 , so that the minimal polynomial has repeated 
roots. This problem cannot be transformed away by enlarging the field; we 
are stuck with what we have. 

Though it is beyond the scope of this course, it can be shown that if all the roots 
of the characteristic polynomial lie in the field K, then the matrix is similar to one 
in Jordan form. 



4.7 Trace 

Here we meet another function of a linear map, and consider its relation to the 
eigenvalues and the characteristic polynomial. 



Definition 4.7 The trace Tr(A) of a square matrix A is the sum of its diagonal 
entries. 
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Proposition 4.12 

Tr(5A). 



(a) For any two nxn matrices A and B, we have Tr(AB) = 



(b) Similar matrices have the same trace. 
Proof (a) 

Tr(A5) = £(A5), = ££A,-,-5,,, 

i=l i=\ j=l 

by the rules for matrix multiplication. Now obviously Tr(5A) is the same thing, 
(b) Tr(p-iAP) = Tr(APP-i) = Tr(A7) = Tr(A). 

The second part of this proposition shows that, if a : V — > V is a linear map, 
then any two matrices representing a have the same trace; so, as we did for the 
determinant, we can define the trace Tr(a) of a to be the trace of any matrix 
representing a. 

The trace and determinant of a are coefficients in the characteristic polyno- 
mial of a. 

Proposition 4.13 Let a -.V be a linear map, where dim(y) — n, and let Ca 

be the characteristic polynomial of a, a polynomial of degree n with leading term 

(a) The coefficient of is — Tr(a), and the constant term is (— l)"det(a). 

(b) If a is diagonalisable, then the sum of its eigenvalues is Tr(a) and their 
product is det(a). 

Proof Let A be a matrix representing a. We have 



Ca{x) = det(jc/-A) 



x — aii —ai2 
— a2i X — ail 



—a\n 
—ain 



The only way to obtain a term in in the determinant is from the product 
{x — an) (jc — ail) ■■■{x — Unn) of diagonal entries, taking —an from the ith factor 
and X from each of the others. (If we take one off-diagonal term, we would have 
to have at least two, so that the highest possible power of x would be x"~^.) So the 
coefficient of x"~^ is minus the sum of the diagonal terms. 

Putting x = 0, we find that the constant term is Ca(0) =det(— A) = ( — l)"det(A). 

If a is diagonalisable then the eigenvalues are the roots of Ca{x): 

Ca{x) = {x- Xi){x- ■ ■ ■ {x- X„). 

Now the coefficient of x""^ is minus the sum of the roots, and the constant term 
is (—1)" times the product of the roots. 
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Chapter 5 

Linear and quadratic forms 



In this chapter we examine "forms", that is, functions from a vector space V to 
its field, which are either Unear or quadratic. The Unear forms comprise the dual 
space of y ; we look at this and define dual bases and the adjoint of a Unear map 
(corresponding to the transpose of a matrix). 

Quadratic forms make up the bulk of the chapter. We show that we can change 
the basis to put any quadratic form into "diagonal form" (with squared terms only), 
by a process generalising "completing the square" in elementary algebra, and that 
further reductions are possible over the real and complex numbers. 



5.1 Linear forms and dual space 

The definition is simple: 

Definition 5.1 Let V be a vector space over K. A linear form on V is a linear map 

from V to K, where K is regarded as a 1 -dimensional vector space over K: that is, 
it is a function from V to K satisfying 

/(Vl + V2) = /(Vl) +/(V2), f{cv) = Cf{v) 

for all Vl , V2, V G y and c G K. 

If dim(y) = n, then a linear form is represented by a 1 x n matrix over K., 
that is, a row vector of length n over K. If / = [ai ai ... an], then for v = 
[jci JC2 ... jc„]^wehave 



/(v) = [ai a2 



Xl 
-Xfi 



= aixi + aixi H h a„x„. 
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Conversely, any row vector of length n represents a linear form on K". 

Definition 5.2 Linear forms can be added and multiplied by scalars in the obvious 
way: 

if I + /2) (V) = /l (V) + /2 (V) , (C/) (V) = C/(V) . 

So they form a vector space, which is called the dual space of V and is denoted 
byy*. 

Not surprisingly, we have: 

Proposition 5.1 IfV is finite-dimensional, then so is V*, and dim(y*) = dim(y). 

Proof We begin by observing that, if (vi , . . . , v„) is a basis for V, and ai , . . . , fl„ 
are any scalars whatsoever, then there is a unique linear map / with the property 
that f{vi) = ai for i= It is given by 

/{cm H \-CnV„) = a\c\ H l-a„c„, 

in other words, it is represented by the row vector [a\ a-i ... an], and its 
action on K" is by matrix multiplication as we saw earlier. 
Now let // be the linear map defined by the rule that 

1 if z = j, 

ifiy^j. 

Then (/i, . . . form a basis for V*; indeed, the linear form / defined in the 

preceding paragraph is ai/i -\ \-anfn- This basis is called the dual basis of 

V* corresponding to the given basis for V. Since it has n elements, we see that 
dim(y*) = n = dim(y). 

We can describe the basis in the preceding proof as follows. 

Definition 5.3 The Kronecker delta 5ij for j, _/ e { 1 , . . . , n} is defined by the rule 
that 

1 if / = j. 



fii^j) 



^'j'-\0 if i^j. 

Note that 5,j is the (z, j) entry of the identity matrix. Now, if (vi , . . . , v„) is a basis 
for V, then the dual basis for the dual space V* is the basis (/i, . . . ,/„) satisfying 

fii^j) = Sij. 

There are some simple properties of the Kronecker delta with respect to sum- 
mation. For example, 

n 

Y^5ijai = aj 

for fixed j e {l,...,n}. This is because all terms of the sum except the term i — j 
are zero. 
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5.1.1 Adjoints 

Definition 5.4 Let a:V ^Whea linear map. There is a linear map a* :W* ^ 
V* (note the reversal!) defined by 

(«*(/))(v)=/(«(v)). 
The map a* is called the adjoint of a. 

This definition takes a bit of unpicking. We are given a : V — > and asked to 
define a* :W* ^V*. This means that, to any element f eW* (any linear form on 
W) we must associate a Unear form g = ct*{f) e V*. This linear form must act on 
vectors v e V to produce scalars. Our definition says that a*(/) maps the vector v 
to the scalar /(a(v)): this makes sense because a{v) is a vector in W, and hence 
the linear form f eW* can act on it to produce a scalar. 

Now a*, being a linear map, is represented by a matrix when we choose bases 
for W* and V*. The obvious bases to choose are the dual bases corresponding to 
some given bases of W and V respectively. What is the matrix? Some calculation 
shows the following, which will not be proved in detail here. 

Proposition 5.2 Let a :V be a linear map. Choose bases Bfor V, and Cfor 

W, and let A be the matrix representing a relative to these bases. Let B* and C* 
denote the dual bases ofV* and W* corresponding to B and C. Then the matrix 
representing a* relative to the bases C* and B* is the transpose of A, that is, A~^. 



5.1.2 Change of basis 

Suppose that we change bases in V from 5 = (vi , . . . , v„) to i?' = (vj , . . . , v'J, with 
change of basis matrix P — Pb^b'- How do the dual bases change? In other words, 
if 5* = (/i, . . . ,/„) is the dual basis of B, and {B')* = (/{,... ,/^) the dual basis 
of B, then what is the transition matrix ^5* (5')*? The next result answers the 
question. 

Proposition 5.3 Let B and B' be bases for V, and B* and {B')* the dual bases of 
the dual space. Then 




Proof Use the notation from just before the Proposition. If P = Pb^b' has (/, j) 
entry pij, and Q = Pb*,{b')* has {i,j) entry qij, we have 

n 

k=l 
n 

fj = Ulljfl^ 

1=1 
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and so 

Sij = f'M) 

n n 

l=lk=l 
n 

— 'Y^lkjPki- 

k=l 

Now qkj is the (7,^) entry of , and so we have 

whence = , so that Q — {P~^)^ = (P^) \ as required. 

5.2 Quadratic forms 

A lot of applications of mathematics involve dealing with quadratic forms: you 
meet them in statistics (analysis of variance) and mechanics (energy of rotating 
bodies), among other places. In this section we begin the study of quadratic forms. 

5.2.1 Quadratic forms 

For almost everything in the remainder of this chapter, we assume that 

the characteristic of the field K is not equal to 2. 

This means that 2 in K, so that the element 1 /2 exists in EC. Of our Ust of 
"standard" fields, this only excludes F2, the integers mod 2. (For example, in F5, 
we have 1/2 = 3.) 

A quadratic form as a function which, when written out in coordinates, is a 
polynomial in which every term has total degree 2 in the variables. For example, 

q{x,y,z) =:)^ + 4xy + 2xz-3y^ -2yz-Z^ 

is a quadratic form in three variables. 

We will meet a formal definition of a quadratic form later in the chapter, but 
for the moment we take the following. 
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Definition 5.5 A quadratic form in n variables xi, . . . ,x„ over a field ^ is a poly- 
nomial 

n n 

in the variables in which every term has degree two (that is, is a multiple of jc/jcy 
for some i, j). 

In the above representation of a quadratic form, we see that if / ^ j, then the 
term in XiXj comes twice, so that the coefficient of XiXj is aij + aji. We are free to 
choose any two values for aij and ajj as long as they have the right sum; but we 
will always make the choice so that the two values are equal. That is, to obtain a 
term cxiXj, we take atj = aji = c/2. (This is why we require that the characteristic 
of the field is not 2.) 

Any quadratic form is thus represented by a symmetric matrix A with j) 
entry Uij (that is, a matrix satisfying A — A^). This is the third job of matrices in 
linear algebra: Symmetric matrices represent quadratic forms. 

We think of a quadratic form as defined above as being a function from the 
vector space K" to the field K. It is clear from the definition that 

^(xi , . . . = v^Av, where v = 

Now if we change the basis for V, we obtain a different representation for the 
same function q. The effect of a change of basis is a linear substitution v = Pv' on 
the variables, where P is the transition matrix between the bases. Thus we have 

v^Av={Pv')JA{Pv') = (y'Y [P'^ AP)v' , 

so we have the following: 

Proposition 5.4 A basis change with transition matrix P replaces the symmetric 
matrix A representing a quadratic form by the matrix P^AP. 

As for other situations where matrices represented objects on vector spaces, 
we make a definition: 

Definition 5.6 Two symmetric matrices A, A' over a field K are congruent if A' = 
P^AP for some invertible matrix P. 

Proposition 5.5 Two symmetric matrices are congruent if and only if they repre- 
sent the same quadratic form with respect to different bases. 



Xl 
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Our next job, as you may expect, is to find a canonical form for symmetric 
matrices under congruence; that is, a choice of basis so that a quadratic form has 
a particularly simple shape. We will see that the answer to this question depends 
on the field over which we work. We will solve this problem for the fields of real 
and complex numbers. 

5.2.2 Reduction of quadratic forms 

Even if we cannot find a canonical form for quadratic forms, we can simplify them 
very greatly. 

Theorem 5.6 Let q be a quadratic form in n variables over a field 

K whose characteristic is not 2. Then by a suitable linear substitution to new 
variables y i , . . . , we can obtain 

q = c\y\ + C2y2 + --- + Cnyl 

for some ci , . . . , c„ e K. 

Proof Our proof is by induction on n. We call a quadratic form which is written 
as in the conclusion of the theorem diagonal. A form in one variable is certainly 
diagonal, so the induction starts. Now assume that the theorem is true for forms 
in n — 1 variables. Take 

n n 

q{x\ , . . . -iXfi) = ^ ^ aijXiXj, 
i=\j=l 

where aij = aji for / ^ j. 

Case 1 : Assume that an ^ for some /. By a permutation of the variables (which 
is certainly a linear substitution), we can assume that an^O. Let 

n 
i=2 

Then we have 

n 

any I = aiixi + 2 fli/xix, + q'{x2, . . . ,x„), 

i=2 

where q' is a quadratic form in X2t ■ ■ ,Xn. That is, all the terms involving xi in q 
have been incorporated into flnjf . So we have 

q{xi,. . . ,Xn) = anyl + q"{x2, . . . 
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where q" is the part of q not containing xi minus q'. 
By induction, there is a change of variable so that 

n 

q"{x2,...,Xn) = J^C/y?, 

i=2 

and so we are done (taking ci = an). 



Case 2: All an are zero, but aij ^ for some / j. Now 

so taking = ^{xi-\-Xj) and = \ {xi — Xj), we obtain a new form for q which 
does contain a non-zero diagonal term. Now we apply the method of Case 1. 



Case 3: All aij are zero. Now q is the zero form, and there is nothing to prove: 
take ci = • • • = c„ = 0. 

Example 5.1 Consider the quadratic form q{x,y, z) =x^ + 2xy + Axz + + 4z^. 
We have 

{x+y + lzf =:^ + lxy + Axz+y^ + Az^ + Ayz, 



and so 



q = {x+y + lzf-Ayz 

= {x+y + 2zf-{y + zf+{y-zf 

where u = x+y+2z,v = y — z,w = y + z. Otherwise said, the matrix representing 
the quadratic form, namely 



A = 



1 1 2 

1 1 

2 4 



is congruent to the matrix 



A' 



1 
1 
0-1 



Can you find an invertible matrix P such that P^AP = A'? 
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Thus any quadratic form can be reduced to the diagonal shape 

aixj H h a„xl 

by a linear substitution. But this is still not a "canonical form for congruence". 
For example, ifyi =xi/c, then aixf = {aic^)yl. In other words, we can multiply 
any a, by any factor which is a perfect square in K. 

Over the complex numbers C, every element has a square root. Suppose that 
ai , . . . , ttr 7^ 0, and a^+i = • • • = a„ = 0. Putting 

_ J {^/ai)xi for I <i<r, 
^' \ Xi for r + 1 <i<n, 

we have 

q = y\ + ---+y^r- 

We will see later that r is an "invariant" of q: however we do the reduction, we 
arrive at the same value of r. 

Over the real numbers R, things are not much worse. Since any positive real 
number has a square root, we may suppose that ai , . . . , > 0, a^+i , . . . , a^+j < 0, 
and 0Cs+t+\ , • • • , Ofn = 0- Now putting 

{{^/ai)Xi for 1 < / < 5, 
{^/-ai)xi fox s+\<i<s + t, 
Xi for 5 + r + 1 < j < n, 

we get 

q = x\-{-----\-x-\-s — x^^i — • ■ • — x^^j . 

Again, we will see later that .v and t don't depend on how we do the reduction. 
[This is the theorem known as Sylvester's Law of Inertia.] 

5.2.3 Quadratic and bilinear forms 

The formal definition of a quadratic form looks a bit different from the version we 
gave earlier, though it amounts to the same thing. First we define a bilinear form. 

Definition 5.7 (a) Let b :V xV ^Khe a function of two variables from V 
with values in K. We say that bis a bilinear form if it is a linear function of 
each variable when the other is kept constant: that is, 

b{v,wi+W2) = b{v,wi) + b{v,W2), b{v,cw) = cb{v,w), 

with two similar equations involving the first variable. A bilinear form b is 
symmetric if b{v,w) = b{w,v) for all v,w & V. 
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(b) Let ^ : y — > K be a function. We say that ^ is a quadratic form if 

- q{cv) = c^q{v) for all c G K, v G V; 

- the function b defined by 

b{v,w) = j{q{v + w)-q{v)-q{w)) 
is a bilinear form on V. 

Remarks The bilinear form in the second part is symmetric; and the division 
by 2 in the definition is permissible because of our assumption that the character- 
istic of K is not 2. 

If we think of the prototype of a quadratic form as being the function x^, then 
the first equation says (cx)^ = c^x^, while the second has the form 

and xy is the prototype of a bilinear form: it is a linear function of x when y is 
constant, and vice versa. 

Note that the formula b{x,y) = \{q{x-\-y) — q{x) — q{y)) (which is known as 
the polarisation formula) says that the bilinear form is determined by the quadratic 
term. Conversely, if we know the symmetric bilinear form b, then we have 

2q{v) = Aq{v) - 2q{v) = q{v + v) - ^(v) - ^(v) = 2b{v, v) , 

so that ^(v) = b{v,v), and we see that the quadratic form is determined by the 
symmetric biUnear form. So these are equivalent objects. 

If & is a symmetric bilinear form on V and 5 = (vi , . . . , v„) is a basis for V, 
then we can represent b by the nxn matrix A whose (z, j) entry is aij = b{vi, Vj) . 
Note that A is a symmetric matrix. It is easy to see that this is the same as the 
matrix representing the quadratic form. 

Here is a third way of thinking about a quadratic form. Let V* be the dual 
space of y , and let a : V — > V* be a linear map. Then for v G V, we have a (v) G V*, 
and so a(v)(w) is an element of K. The function 

&(v,w) = a(v)(w) 

is a biUnear form on V . If a(v)(w) = a(w)(v) for all v,w G V, then this biUnear 
form is symmetric. Conversely, a symmetric bilinear form b gives rise to a Unear 
map a:V ^V* satisfying a(v)(w) = a{w){v), by the rule that a(v) is the Unear 

map w ^ b{y,w). 

Now given a :V -^V* , choose a basis B for V, and let B* be the dual basis 
for y*. Then a is represented by a matrix A relative to the bases B and B*. 
Summarising: 
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Proposition 5.7 The following objects are equivalent on a vector space over a 
field whose characteristic is not 2: 

(a) a quadratic form on V; 

(b) a symmetric bilinear form on V; 

(c) a linear map a :V ^V* satisfying a{v){w) = a{w){v) for all v,w e V. 

Moreover, if corresponding objects of these three types are represented by ma- 
trices as described above, then we get the same matrix A in each case. Also, a 
change of basis in V with transition matrix P replaces A by P^AP. 

Proof Only the last part needs proof. We have seen it for a quadratic form, and 

the argument for a bilinear form is the same. So suppose that a : V ^ V* , and 
we change from B to B' in V with transition matrix P. We saw that the transition 
matrix between the dual bases in V* is (P^)^^ Now go back to the discussion 
of linear maps between different vector spaces in Chapter 4. If a : V — > W and 
we change bases in V and W with transition matrices P and Q, then the matrix 
A representing a is changed to Q~^AP. Apply this with Q = P^)~^, so that 
!2"i ^P^, and we see that the new matrix is P^AP, as required. 

5.2.4 Canonical forms for complex and real forms 

Finally, in this section, we return to quadratic forms (or symmetric matrices) over 
the real and complex numbers, and find canonical forms under congruence. Re- 
call that two symmetric matrices A and A' are congruent if A' = P^AP for some 
invertible matrix P; as we have seen, this is the same as saying that the represent 
the same quadratic form relative to different bases. 

Theorem 5.8 Any nxn complex symmetric matrix A is congruent to a matrix of 
the form 

Ir O' 
O O 

for some r. Moreover, r — rank(A), and so A is congruent to two matrices of this 
form then they both have the same value ofr. 

Proof We already saw that A is congruent to a matrix of this form. Moreover, if 
P is invertible, then so is P^ , and so 

r = rank(P^AP) = rank(A) 

as claimed. 
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The next result is Sylvester's Law of Inertia. 

Theorem 5.9 Any nxn real symmetric matrix A is congruent to a matrix of the 
form 

'Is O O' 
O -It o 

o o o 

for some s,t. Moreover, if A is congruent to two matrices of this form, then they 
have the same values ofs and oft. 



Proof Again we have seen that A is congruent to a matrix of this form. Arguing 
as in the complex case, we see that s + t = rank(A), and so any two matrices of 
this form congruent to A have the same values of s + t. 

Suppose that two different reductions give the values s,t and s',t' respectively, 
with s + t = s' + t' = n. Suppose for a contradiction that s < s'. Now let q be the 
quadratic form represented by A. Then we are told that there are Unear functions 
yi,...,yn and zi , . . . , z„ of the original variables xi , . . . ,x„ of ^ such that 

q^yi + --- +yl~y^+i j^+j = 4 + ■ ■ ■ + 4 z^+t- 

Now consider the equations 

yi =0,...,ys = 0,Zs'+i =0,...z„ = 

regarded as linear equations in the original variables xi, . . . ,x„. The number of 
equations is s + {n — s') = n — {s' — s) <n. According to a lemma from much ear- 
lier in the course (we used it in the proof of the Exchange Lemma!), the equations 
have a non-zero solution. That is, there are values of xi, . . . ,x„, not all zero, such 
that the variables ji , . . . , and z^'+i , . . . , z„ are all zero. 
Since yi = • • • = = 0, we have for these values 

(i = -y;+\ j«<o. 

But since z^f+i = • • • = z„ = 0, we also have 

? = zi + ---+zj >0. 

But this is a contradiction. So we cannot have s < s'. Similarly we cannot have 
s' < s either. So we must have s = s', as required to be proved. 
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We saw that s + ? is the rank of A. The number s — ns known as the signature of 
A. Of course, both the rank and the signature are independent of how we reduce 
the matrix (or quadratic form); and if we know the rank and signature, we can 
easily recover s and t. 

You will meet some further terminology in association with Sylvester's Law of 
Inertia. Let ^ be a quadratic form in n variables represented by the real symmetric 
matrix A. Let q (or A) have rank s + t and signature s — t, that is, have s positive 
and t negative terms in its diagonal form. We say that q (or A) is 

• positive definite if s = n (and t = 0), that is, if ^(v) > for all v, with equality 
only if V = 0; 

• positive semidefinite if ? = 0, that is, if ^(v) > for all v; 

• negative definite if t — n (and s = 0), that is, if ^(v) < for all v, with 
equality only if v = 0; 

• negative semi-definite if 5 = 0, that is, if ^(v) < for all v; 

• indefinite if s > and t >0, that is, if ^(v) takes both positive and negative 
values. 



Chapter 6 

Inner product spaces 



Ordinary Euclidean space is a 3-dimensional vector space over M, but it is more 
than that: the extra geometric structure (lengths, angles, etc.) can all be derived 
from a special kind of bilinear form on the space known as an inner product. We 
examine inner product spaces and their linear maps in this chapter. 

One can also define inner products for complex vector spaces, but things are 
a bit different: we have to use a form which is not quite bilinear. We defer this to 
Chapter 8. 

6.1 Inner products and orthonormal bases 

Definition 6.1 An inner product on a real vector space V is a function b:V xV ^ 
R satisfying 

• bis bilinear (that is, & is linear in the first variable when the second is kept 
constant and vice versa); 

• bis positive definite, that is, b{v,v) > for all v e V, and b{v,v) — if and 
only if V = 0. 

We usually write b{v,w) as v-w. An inner product is sometimes called a dot 
product (because of this notation). 

Geometrically, in a real vector space, we define v-w= \v\. |w| cos 6, where |v| 
and |w| are the lengths of v and w, and 6 is the angle between v and w. Of course 
this definition doesn't work if either v or w is zero, but in this case v -w = 0. But 
it is much easier to reverse the process. Given an inner product on V, we define 
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for any vector veV; and, if v, w 7^ 0, then we define the angle between them to be 
6, where 

v-w 

COSU = —— r. 

\v\.\w\ 

For this definition to make sense, we need to know that 

— |v|.|w| < v-w < 

for any vectors v, w (since cos Ues between —1 and 1). This is the content of the 
Cauchy-Schwarz inequality: 

Theorem 6.1 Ifv, w are vectors in an inner product space then 

iy-yvf' < (v-v)(w-w). 

Proof By definition, we have (v +xw) ■ {v + xw) > for any real number x. Ex- 
panding, we obtain 

x^(w -w) +2x{v -w) + {v -v) >0. 

This is a quadratic function in x. Since it is non-negative for all real x, either it has 
no real roots, or it has two equal real roots; thus its discriminant is non-positive, 
that is, 

(y • w)^ — (v • v){w ■ w) < 0, 

as required. 

There is essentially only one kind of inner product on a real vector space. 

Definition 6.2 A basis (vi, . . . , v„) for an inner product space is called orthonor- 
mal if Vi ■ Vj = dij (the Kronecker delta) for 1 < i,j < n. 

Remark: If vectors vi , . . . , v„ satisfy v,- ■ Vj = 5ij, then they are necessarily lin- 
early independent. For suppose that c\Vl^ h c„v„ = 0. Taking the inner product 

of this equation with v/, we find that = 0, for all /. 

Theorem 6.2 Let ■ be an inner product on a real vector space V. Then there is an 
orthonormal basis (vi, . . . , v„) for V. If we represent vectors in coordinates with 
respect to this basis, say v — [xi X2 ... x„]^ and w — [yi yj ... ynV > 
then 

VW = Xiyi +X2y2 H \-Xnyn- 
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Proof This follows from our reduction of quadratic forms in the last chapter. 
Since the inner product is bilinear, the function ^(v) = vv = |vp is a quadratic 
form, and so it can be reduced to the form 

^ = Xj H \-Xg — • • • —^-^.f 

Now we must have s = n and t = 0. For, if ? > 0, then the 5 + 1 st basis vector Vv+i 
satisfies v^+i • v^+i = — 1 ; while if s + t < n, then the nth basis vector v„ satisfies 
Vn • v„ = 0. Either of these would contradict the positive definiteness of V. Now 
we have 

and by polarisation we find that 

b{{xi,...,xn),{yi,...,yn)) =xiyi-\ \-Xnyn, 

as required. 

However, it is possible to give a more direct proof of the theorem; this is 
important because it involves a constructive method for finding an orthonormal 
basis, known as the Gram-Schmidt process. 

Let wi, . . . , w„ be any basis for V. The Gram-Schmidt process works as fol- 
lows. 

• Since wi ^ 0, we have wi-wi> 0, that is, > 0. Put vi = then 
|vi I = 1, that is, vi • vi = 1. 

• For / = 2, . . . , n, let = Wi — (vi • Wi)vi . Then 

Vi • W- = Vl • W,'- (vi ■ w,)(vi • Vl) = 

for i = 2,. . . ,n. 

• Now apply the Gram-Schmidt process recursively to (^2, . . . , w^). 

Since we replace these vectors by linear combinations of themselves, their inner 
products with vi remain zero throughout the process. So if we end up with vectors 
V2, . . . , v„, then vi ■ = for / = 2, . . . , n. By induction, we can assume that v, • vj = 
5ij for /, 7 = 2, . . . , n; by what we have said, this holds if / or j is 1 as well. 

Definition 6.3 The inner product on M" for which the standard basis is orthonor- 
mal (that is, the one given in the theorem) is called the standard inner product on 
W. 
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Example 6.1 In (with the standard inner product), apply the Gram-Schmidt 

process to the vectors wi = [ 1 2 2]^, W2 = [ 1 1 0]^, W3 = [ 1 0]^. 
To simplify things, I will write (ai, (22, (Sis) instead of [ai 02 03] . 
We have wi • wi = 9, so in the first step we put 

VI = 3^1 = (3, 3, 3). 

Now v\-W2 = 1 and vi • W3 = |, so in the second step we find 

w'2 = W2-V1 = 

W3 = W3-5V1 = (|,-|,|). 

Now we apply Gram-Schmidt recursively to W2 and W3. We have • = 1, 
so V2 = W2 = Then V2-W3 = |, so 

VVj _W3-3V2- (9,-9,9). 

Finally, w'^ ■ ^3 = J, so V3 = §^3 = ( | , - f , 5) . 

Check that the three vectors we have found really do form an orthonormal 
basis. 

6.2 Adjoints and orthogonal linear maps 

We saw in the last chapter that a bilinear form on V is the same thing as a linear 
map from V to its dual space. The importance of an inner product is that the 
corresponding linear map is a bijection which maps an orthonormal basis of V to 
its dual basis in V*. 

Recall that the linear map a : V — > V* corresponding to a bilinear form b on 
V satisfies a{v){w) = b{v,w); in our case, a{v){w) =v-w. Now suppose that 
(vi , . . . , v„) is an orthonormal basis for V, so that v, • vj = 5ij. Then, if a(v;) = 
we have fi{vj) = dij; but this is exactly the statement that (/i, ...,/„) is the dual 
basis to (vi, . . . , v„). 

So, on an inner product space V, we have a natural way of matching up V with 

y*. 

Recall too that we defined the adjoint of a : V ^ V to be the map a* -.V* ^ V* 
defined by a*(/)(v) — f{a{v)), and we showed that the matrix representing a* 
relative to the dual basis is the transpose of the matrix representing a relative to 

the original basis. 

Translating all this to an inner product space, we have the following definition 
and result: 
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Definition 6.4 Let V be an inner product space, and a : V — > V a linear map. Then 
the adjoint of a is the linear map a* : V — > V defined by 

v-a*{w) = a{v)-w. 



Proposition 6.3 If a is represented by the matrix A relative to an orthonormal 
basis ofV, then a* is represented by the transposed matrix A^ . 

Now we define two important classes of linear maps on V. 

Definition 6.5 Let a be a linear map on an inner product space V. 

(a) a is self-adjoint if a* = a. 

(b) a is orthogonal if it is invertible and a* = a~^. 

Proposition 6.4 If a is represented by a matrix A ( relative to an orthonormal 
basis), then 

(a) a is self-adjoint if and only if A is symmetric; 

(b) a is orthogonal if and only ifA^A — I. 

Part (a) of this result shows that we have yet another equivalence relation on 
real symmetric matrices: 

Definition 6.6 Two real symmetric matrices are called orthogonally similar if 
they represent the same self-adjoint map with respect to different orthonormal 
bases. 

Then, from part (b), we see: 

Proposition 6.5 Two real symmetric matrices A and A' are orthogonally similar 
if and only if there is an orthogonal matrix P such that A' = P~^AP = P^AP. 

Here P^ =P^ because P is orthogonal. We see that orthogonal similarity is a 
refinement of both similarity and congruence. We will examine self-adjoint maps 
(or symmetric matrices) further in the next section. 
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Next we look at orthogonal maps. 

Theorem 6.6 The following are equivalent for a linear map a on an inner prod- 
uct space V: 

(a) a is orthogonal; 

(b) a preserves the inner product, that is, a{v) • a{w) = v-w; 

(c) a maps an orthonormal basis ofV to an orthonormal basis. 

Proof We have 

a(v) • a{w) = V - a*(a(w)), 

by the definition of adjoint; so (a) and (b) are equivalent. 

Suppose that (vi, . . . , v„) is an orthonormal basis, that is, v, ■Vj = dij. If (b) 
holds, then a(v,) ■ oc{vj) — so that (a(vi), . . . , a(v„) is an orthonormal basis, 
and (c) holds. Converesely, suppose that (c) holds, and let v = J^^iVi and w — £jjV( 
for some orthonormal basis (vi , . . . , v„), so that v • w = ^x/y,. We have 

a(v) • a{w) = {Y,xia{vi)) ■ (J^j/a(vi)) = Y^^iyi, 

since a(v;) • oc{vj) = 5ij by assumption; so (b) holds. 

Corollary 6.7 a is orthogonal if and only if the columns of the matrix represent- 
ing a relative to an orthonormal basis themselves form an orthonormal basis. 

Proof The columns of the matrix representing a are just the vectors a(vi), . . . , ot(v„), 
written in coordinates relative to vi, . . . , v„. So this follows from the equivalence 
of (a) and (c) in the theorem. Alternatively, the condition on columns shows that 
A^A = I, where A is the matrix representing a , so a* a = I, and a is orthogonal. 

Example Our earlier example of the Gram-Schmidt process produces the or- 
thogonal matrix 



r 1 


2 


2 -1 


3 


3 


3 


2 


1 


2 


3 


3 


3 


2 


2 


1 


1-3 


3 


3 -1 



whose columns are precisely the orthonormal basis we constructed in the example. 
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We come to one of the most important topics of the course. In simple terms, any 
real symmetric matrix is diagonalisable. But there is more to be said! 

7.1 Orthogonal projections and orthogonal decom- 
positions 

We say that two vectors v, w in an inner product space are orthogonal if v • w = 0. 

Definition 7.1 Let V be a real inner product space, and U a subspace of V . The 
orthogonal complement of U is the set of all vectors which are orthogonal to 
everything in U : 

C/^ = {w e y : w • M = for all M e f/}. 

Proposition 7.1 If V is an inner product space and U a subspace of V, with 
dim(y) = n and dim{U) — r, then is a subspace ofV, and dim(i7^) —n — r. 
Moreover, V = U ® U^. 

Proof Proving that JJ-^ is a subspace is straightforward from the properties of 
the inner product. If w\,W2 G U^, then w\ ■ u = W2 ■ u — Q fox dXX u & U , so 
(wi + W2) • M = for all M e U, whence wi + W2 G U^. The argument for scalar 
multiples is similar. 

Now choose a basis for U and extend it to a basis for V . Then apply the Gram- 
Schmidt process to this basis (starting with the elements of the basis for U), to 
obtain an orthonormal basis (vi, . . . , v„). Since the process only modifies vectors 
by adding multiples of earlier vectors, the first r vectors in the resulting basis will 
form an orthonormal basis for U. The last n — r vectors will be orthogonal to 
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U, and so lie in U ; and they are clearly linearly independent. Now suppose that 
w e f/"*" and w = Y^CjVi, where (vi , . . . , v„) is the orthonormal basis we constructed. 
Then c, = w • v,- = for / = 1 , . . . , r; so w is a linear combination of the last n — r 
basis vectors, which thus form a basis of if-^. Hence dim{U^) — n — r, as required. 

Now the last statement of the proposition follows from the proof, since we 
have a basis for V which is a disjoint union of bases for U and U^. 

Recall the connection between direct sum decompositions and projections. If 
we have projections Pi,...,Pr whose sum is the identity and which satisfy PiPj = 
O for i 7^ j, then the space V is the direct sum of their images. This can be refined 
in an inner product space as follows. 

Definition 7.2 Let V be an inner product space. A Unear map ;r : V ^ V is an 

orthogonal projection if 

(a) ;r is a projection, that is, TC-^ = 7i; 

(b) 7t is self-adjoint, that is, %* = % (where ;r* (v) • w = v • Tt{w) for all v, w e V). 

Proposition 7.2 Ifn is an orthogonal projection, then Ker(;r) = Im(;r)-'-. 

Proof We know that V = Ker(;r) ® Im(;r) ; we only have to show that these two 
subspaces are orthogonal. So take v e Ker(;r), so that ;r(v) = 0, and w e Im(;r), 
so that w = 'k{u) for some w e V. Then 

V -W = V ■Tt{u) = Tt*{v) ■U= Tt{v) • M = 0, 

as required. 

Proposition 7.3 Let Tti,... ,7tr be orthogonal projections on an inner product 

space V satisfying \- Tir = I and TtiTtj = O for i ^ j. Let Ui = Im(;r;) 

for i= 1 , . . . , r. Then 

V = Ui®U2®---®Ur, 

and ifui e Ui and uj e Uj, then ui and uj are orthogonal. 

Proof The fact that V is the direct sum of the images of the Ki follows from 
Proposition 5.2. We only have to prove the last part. So take Ui and uj as in the 
Proposition, say = ;r,(v) and uj = 7t j{w). Then 

Ui ■ Uj = 7ti{v) ■ 7tj{w) = 7t*{v) ■ Kj{w) = V ■ 7ti{7tj{w)) = 0, 

where the second equality holds since Tti is self-adjoint and the third is the defini- 
tion of the adjoint. 

A direct sum decomposition satisfying the conditions of the theorem is called 
an orthogonal decomposition of V. 

Conversely, if we are given an orthogonal decomposition of V, then we can 
find orthogonal projections satisfying the hypotheses of the theorem. 
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7.2 The Spectral Theorem 

The main theorem can be stated in two different ways. I emphasise that these two 
theorems are the same! Either of them can be referred to as the Spectral Theorem. 

Theorem 7.4 If a is a self-adjoint linear map on a real inner product space V, 
then the eigenspaces of a form an orthogonal decomposition of V. Hence there 
is an orthonormal basis of V consisting of eigenvectors of a. Moreover, there 

exist orthogonal projections ni,. ..,nr satisfying \-nr = I and TtiTtj = O 

for i 7^ j, such that 

Of = AlTTl H h XrTtr, 

where Xi,... ,Xr are the distinct eigenvalues of a. 

Theorem 7.5 Let A be a real symmetric matrix. Then there exists an orthogonal 
matrix P such that P~^AP is diagonal. In other words, any real symmetric matrix 
is orthogonally similar to a diagonal matrix. 

Proof The second theorem follows from the first, since the transition matrix from 
one orthonormal basis to another is an orthogonal matrix. So we concentrate on 
the first theorem. It suffices to find an orthonormal basis of eigenvectors, since 
all the rest follows from our remarks about projections, together with what we 
already know about diagonalisable maps. 

The proof will be by induction onn — dim(V). There is nothing to do if n = 1. 
So we assume that the theorem holds for [n — 1) -dimensional spaces. 

The first job is to show that a has an eigenvector. 

Choose an orthonormal basis; then a is represented by a real symmetric ma- 
trix A. Its characteristic polynomial has a root A over the complex numbers. (The 
so-called "Fundamental Theorem of Algebra" asserts that any polynomial over C 
has a root.) We temporarily enlarge the field from M to C Now we can find a 
column vector v e C" such that Av = Av. Taking the complex conjugate, remem- 
bering that A is real, we have Av — Xv. 

If V = [zi Z2 ■■■ Zn ] ^, then we have 

= {Av^v 

= v^Av 

= vT(Av) 

= A(|zi|2+|z2p + --- + |z„P), 

SO (A — |z2pH l"knP) = 0. Since V is not the zero vector, the second 

factor is positive, so we must have A = A, that is, A is real. 
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Now since a has a real eigenvalue, we can choose a real eigenvector v, and 
(multiplying by a scalar if necessary) we can assume that |v| = 1. 

Let U be the subspace = {u G V : v ■ u = 0}. This is a subspace of V of 
dimension n—l. We claim that a :U U. For take uEU. Then 

v-a{u) = a*{v)-u = a{v) ■u = Xv-u = 0, 

where we use the fact that a is self -adjoint. So a{u) e U. 

So a is a self-adjoint linear map on the {n — 1) -dimensional inner product 
space U. By the inductive hypothesis, U has an orthonormal basis consisting of 
eigenvectors of a. They are all orthogonal to the unit vector v; so, adding v to the 
basis, we get an orthonormal basis for V, and we are done. 



Remark The theorem is almost a canonical form for real symmetric relations 
under the relation of orthogonal congruence. If we require that the eigenvalues 
occur in decreasing order down the diagonal, then the result is a true canonical 
form: each matrix is orthogonally similar to a unique diagonal matrix with this 
property. 

Corollary 7.6 If a is self-adjoint, then eigenvectors of a corresponding to dis- 
tinct eigenvalues are orthogonal. 

Proof This follows from the theorem, but is easily proved directly. If a{v) — Xv 
and a{w) — jiw, then 

Xv-w = a(v) - w = oc*(v) - w = v- a{w) = juv-w, 

so, if X ^ fi, then vw = 0. 

Example 7.1 Let 

A = 



10 

2 
2 



2 
13 
4 



2 
4 
13 



The characteristic polynomial of A is 



x-10 -2 -2 
-2 jc-13 -4 
-2 -4 ^-13 

so the eigenvalues are 9 and 18. 

For eigenvalue 18 the eigenvectors satisfy 



(jc-9)2(jc-18), 



10 

2 

2 



2 
13 
4 



2 
4 
13 



X 




'l%x' 


y 






z 




18z 
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so the eigenvectors are multiples of [1 2 2]^. Normalising, we can choose a 
unit eigenvector [ ^ | |]^. 

For the eigenvalue 9, the eigenvectors satisfy 



10 2 2 
2 13 4 
2 4 13 



X 




'9x' 


y 




9y 


z 




9z 



that is, x + 2y + 2z — 0. (This condition says precisely that the eigenvectors are 
orthogonal to the eigenvector for A = 18, as we know.) Thus the eigenspace is 2- 
dimensional. We need to choose an orthonormal basis for it. This can be done in 
many different ways: for example, we could choose [0 1/ \/2 —1/ Vl]^ and 
[-4/3^2 1/3^2 1/3^2]^. 
tors. We conclude that, if 



Then we have an orthonormal basis of eigenvec- 



1/3 -4/3V2' 
2/3 1/V2 1/3^2 
2/3 -1/^2 l/3\/2 



then P is orthogonal, and 





"18 








P^AP = 





9 













9 



You might like to check that the orthogonal matrix in the example in the last 
chapter of the notes also diagonalises A. 



7.3 Quadratic forms revisited 

Any real quadratic form is represented by a real symmetric matrix; and, as we 
have seen, orthogonal similarity is a refinement of congruence. This gives us a 
new look at the reduction of real quadratic forms. Recall that any real symmetric 
matrix is congruent to one of the form 

O O' 
O -It O , 
O O O 

where the numbers s and t are uniquely determined: i' + ns the rank, and s — t the 
signature, of the matrix (Sylvester's Law of Inertia). 
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Proposition 7.7 The rank of a real symmetric matrix is equal to the number of 
non-zero eigenvalues, and the signature is the number of positive eigenvalues mi- 
nus the number of negative eigenvalues (counted according to multiplicity). 

Proof Given a real symmetric matrix A, there is an orthogonal matrix P such that 
P^AP is diagonal, with diagonal entries Ai, . . . , say. Suppose that Ai, . . . , A., 
are positive, Xs+i , . . . , A^+f are negative, and the remainder are zero. Let Z) be a 
diagonal matrix with diagonal entries 

1/ a/Ai, • • • 5 1/ V^s, 1/ aZ-Aj+i, . . . , 1/ -xZ-Aj+j, 1, . . . , 1. 

Then 



7.4 Simultaneous diagonalisation 

There are two important theorems which allow us to diagonalise more than one 
matrix at the same time. The first theorem we will consider just in the matrix 
form. 

Theorem 7.8 Let A and B be real symmetric matrices, and suppose that A is 
positive definite. Then there exists an invertible matrix P such that P^AP = I and 
P^BP is diagonal. Moreover, the diagonal entries of P^BP are the roots of the 
polynomial det(xA — 5) = 0. 

Proof A is a real symmetric matrix, so there exists an invertible matrix Pi such 
that PiAP\ is in the canonical form for congruence (as in Sylvester's Law of Iner- 
tia). Since A is positive definite, this canonical form must be 7; that is, PJAP\ = I. 

Now consider Pj BP = C. This is a real symmetric matrix; so, according to 
the spectral theorem (in matrix form), we can find an orthogonal matrix P2 such 
that P2CP2 — D is diagonal. Moreover, P2 is orthogonal, so PjPi — I- 

Let P = P\P2. Then 

P^AP = Pj{PjAPi)P2 = PJIP2 = I, 

and 

P'^BP = pJ{pJbPi)P2 = P2CP2 = D, 

as required. 



(PD) ' APD = D^P^ APD = 
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The diagonal entries of D are the eigenvalues of C, that is, the roots of the 
equation det(x7 — C) = 0. Now we have 

det(Pi^ ) det(xA - B) det(Pi ) = det{Pj {xA-B)Pi)= dia{xPj APi - Pj BPi ) = det(;c/ 

and det(Pj^) =det(Pi) is non-zero; so the polynomials det(xA — 5) anddet(j:/ — C) 
are non-zero multiples of each other and so have the same roots. 

You might meet this formula in mechanics. If a mechanical system has n co- 
ordinates xi, . . . then the kinetic energy is a quadratic form in the velocities 
xi, . . . and (from general physical principles) is positive definite (zero veloc- 
ities correspond to minimum energy); near equilibrium, the potential energy is 
approximated by a quadratic function of the coordinates xi , . . . ,x„. If we simulta- 
neously diagonalise the matrices of the two quadratic forms, then we can solve n 
separate differential equations rather than a complicated system with n variables! 

The second theorem can be stated either for linear maps or for matrices. 

Theorem 7.9 (a) Let a and j8 be self-adjoint maps on an inner product space 
V, and suppose that aj8 = j8a. Then there is an orthonormal basis for V 
which consists of vectors which are simultaneous eigenvalues for a and j8. 

(b) Let A and B be real symmetric matrices satisfying AB = BA. Then there is 
an orthogonal matrix P such that both P^AP and P^BP are diagonal. 

Proof Statement (b) is just a translation of (a) into matrix terms; so we prove (a). 

Let Ai , . . . , be the distinct eigenvalues of a. By the Spectral Theorem, have 
an orthogonal decomposition 

V ^Ui®---®Ur, 

where Ui is the A,-eigenspace of a. 

We claim that j8 maps Ui to Ui. For take u e U, so that a{u) — XiU. Then 

a(j8 (m)) = j8 {a{u)) = j8 = A;j8 (m), 

so /3(m) is also an eigenvector of a with eigenvalue A,. Hence j8(m) G Ui, as 
required. 

Now j8 is a self-adjoint linear map on the inner product space Ui, and so by the 
spectral theorem again, [/,■ has an orthonormal basis consisting of eigenvectors of 
j8. But these vectors are also eigenvectors of a, since they belong to Ui. 

Finally, since we have an orthogonal decomposition, putting together all these 
bases gives us an orthonormal basis of V consisting of simultaneous eigenvectors 
of a and j8. 
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Remark This theorem easily extends to an arbitrary set of real symmetric ma- 
trices such that any two commute. For a finite set, the proof is by induction on 
the number of matrices in the set, based on the proof just given. For an infinite 
set, we use the fact that they span a finite-dimensional subspace of the space of 
all real symmetric matrices; to diagonaUse all the matrices in our set, it suffices to 
diagonalise the matrices in a basis. 



Chapter 8 

The complex case 



The theory of real inner product spaces and self -adjoint linear maps has a close 
parallel in the complex case. However, some changes are required. In this chapter 
we outline the complex case. Usually, the proofs are similar to those in the real 
case. 

8.1 Complex inner products 

There are no positive definite bilinear forms over the complex numbers; for we 
always have (iv) ■ (iv) = — v • v. 

But it is possible to modify the definitions so that everything works in the same 
way over C. 

Definition 8.1 A inner product on a complex vector space V is a map b:V xV ^ 
C satisfying 

(a) b is a linear function of its second variable, keeping the first variable con- 
stant; 

(b) b{w,v) — b{v,w), where denotes complex conjugation. [It follows that 
&(v,v) eR for alive y.] 

(c) b{v, v) > for all v e V, and b{v, v) = if and only if v = 0. 

As before, we write b{v, w) as v • w. This time, b is not linear as a function of 
its first variable; in fact we have 

b{vi + V2,w) = b{vi,w) + b{v2,w) , b{cv,w) = cb{v,w) 

for vi,V2,v,w e V and c G C. (Sometimes we say that b is semilinear (that is, 
"i "-linear) as a function of its first variable, and describe it as a sesquilinear form 
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(that is, "1 ^-linear". A form satisfying (b) is called Hermitian, and one satisfying 
(c) is positive definite. Thus an inner product is a positive definite Hermitian 
sesquilinear form.) 

The definition of an orthonormal basis is exactly as in the real case, and the 
Gram-Schmidt process allows us to find one with only trivial modifications. The 
standard inner product (with respect to an orthonormal basis) is given by 

vw=xlyi-\ Vx^tJn, 

where v=[xi ... Xn\^,w=[yi ■■■ jn]^ . 

The adjoint of a : V — > V is defined as before by the formula 

a*(v) ■w = v-a(w), 

but this time there is a small difference in the matrix representation: if a is rep- 
resented by A (relative to an orthonormal basis), then its adjoint a* is represented 
by (A)^. (Take the complex conjugates of all the entries in A, and then transpose.) 
So 

• a self-adjoint linear map is represented by a matrix A satisfying A = (A)^: 
such a matrix is called Hermitian. 

• a map which preserves the inner product (that is, which satisfies a(v) • 
a{w) = V • w, or a* = a^^) is represented by a matrix A satisfying (A)^ = 
A~^: such a matrix is called unitary. 

8.2 The complex Spectral Theorem 

The spectral theorem for self-adjoint linear maps on complex inner product spaces 
is almost identical to the real version. The proof goes through virtually unchanged. 

The definition of an orthogonal projection is the same: a projection which is 
self- adjoint. 

Theorem 8.1 If a is a self-adjoint linear map on a complex inner product space 
V, then the eigenspaces of a form an orthogonal decomposition ofV. Hence there 
is an orthonormal basis of V consisting of eigenvectors of CC. Moreover, there 

exist orthogonal projections Tti,... ,7tr satisfying Tti^ \- Ttr — I and TtiTtj = O 

for i 7^ j, such that 

OC = XiTti -\ \- XrTtr, 

where Xi,.. .,Xr are the distinct eigenvalues of a. 
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Theorem 8.2 Let A be a complex Hermitian matrix. Then there exists a unitary 
matrix P such that P~^AP is diagonal. 

There is one special feature of the complex case: 

Proposition 8.3 Any eigenvalue of a self-adjoint linear map on a complex inner 
product space (or of a complex Hermitian matrix) is real. 

Proof Suppose that a is self-adjoint and a(v) = Av. Then 

Xv-v = v-a{v) = a*{v) ■ v = a(v) ■ v = Av-v, 

where in the last step we use the fact that (cv) -w = cv -w for a complex inner 
product. So (A — A)v • V = 0. Since v 7^ 0, we have v ■ v ^ 0, and so X —X; that is, 
A is real. 

We also have a theorem on simultaneous diagonalisation: 

Proposition 8.4 Let a and j8 be self-adjoint linear maps of a complex inner prod- 
uct space V, and suppose that aj8 = j8a. Then there is an orthonormal basis for 
V consisting of eigenvectors of both a and j8. 

The proof is as in the real case. You are invited to formulate the theorem in 
terms of commuting Hermitian matrices. 

8.3 Normal matrices 

The fact that the eigenvalues of a complex Hermitian matrix are real leaves open 
the possibility of proving a more general version of the spectral theorem. We saw 
that a real symmetric matrix is orthogonally similar to a diagonal matrix. In fact, 
the converse is also true. For if A is a real nxn matrix and P is an orthogonal 
matrix such that P^AP = D is diagonal, then A = PDP^ , and so 

= PD^P^ = PDP^ = A. 

In other words, a real matrix is orthogonally similar to a diagonal matrix if and 
only if it is symmetric. 

This is not true for complex Hermitian matrices, since such matrices have real 
eigenvalues and so cannot be similar to non-real diagonal matrices. 

What really happens is the following. 

Definition 8.2 (a) Let a be a linear map on a complex inner-product space V. 
We say that a is normal if it commutes with its adjoint: aa* = a* a. 
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(b) Let A be an n X n matrix over C. We say that A is normal if it commutes 
with its conjugate transpose: AA~^ = A^A. 

Theorem 8.5 (a) Let a be a linear map on a complex inner product space V. 
Then V has an orthonormal basis consisting of eigenvectors of a if and only 
if a is normal. 

(b) Let A be an nxn matrix over C. Then there is a unitary matrix P such that 
P~^AP is diagonal if and only if A is normal. 

Proof As usual, the two forms of the theorem are equivalent. We prove it in the 
first form. 

If a has an orthonormal basis (vi,...,v„) consisting of eigenvectors, then 
a(v,) = XiVi for i = where A; are eigenvalues. We see that a*(v/) = A/v;, 

and so 

aa*(v,) = a*a(v/) = A/A,v,-. 

Since aa* and a* a agree on the vectors of a basis, they are equal; so a is normal. 
Conversely, suppose that a is normal. Let 

j8 = i(a + a*), 7=^(a-a*). 

(You should compare these with the formulae x= ^{z-^z),y= 2j(z — z) for the 
real and imaginary parts of a quadratic form. The analogy is even closer, since 
clearly we have a = j8 + i/.) Now we claim: 

• j8 and y are Hermitian. For 

j8* = i(a* + a)=j8, 

f = ^(«*-a) = r, 

where we use the fact that (ca)* = ca*. 

• j8y= yjS. For 

Py = ^(a2-aa* + a*a-(a*)2) = ^(a2-(a*)2), 

yP = i(a2^aa*-a*a-(a*)2) = i(a2-(a*)2). 

(Here we use the fact that aa* = a* a.) 

Hence, by the Proposition at the end of the last section, there is an orthonormal 
basis B whose vectors are eigenvectors of j8 and 7, and hence are eigenvectors of 
a = j8 + iy. 

Note that the eigenvalues of j8 and 7 in this proof are the real and imaginary 
parts of the eigenvalues of a. 



Chapter 9 

Skew-symmetric matrices 



We spent the last three chapters looking at symmetric matrices; even then we 
could only find canonical forms for the real and complex numbers. It turns out 
that life is much simpler for skew-symmetric matrices. We find a canonical form 

for these matrices under congruence which works for any field whatever. (More 
precisely, as we will see, this statement applies to "alternating matrices", but these 
are precisely the same as skew-symmetric matrices unless the characteristic of the 
field is 2.) 

9.1 Alternating bilinear forms 

Alternating forms are as far from positive definite as they can be: 

Definition 9.1 Let V be a vector space over K. A bilinear form & on V is alter- 
nating if b{v, v) = for all v e V. 



Proposition 9.1 An alternating bilinear form b satisfies b{w, v) = —b{v, w) for all 

V, w e y. 

Proof 

= b{v + w,v + w) = b{v,v) + b{v,w) +b{w,v) +b{w,w) = b{v,w) + b{w,v) 
for any v,w e V, using the definition of an alternating bilinear form. 

Now here is the analogue of the Gram-Schmidt process for alternating bilinear 
forms. 
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Theorem 9.2 Let b be an alternating bilinear form on a vector space V. Then 
there is a basis (mi , . . . , , wi , . . . , w^, zi , . . . , Zf ) for V such that b{ui, Wj) — 1 and 
b{wi, Ui) ~ — ifor i=l,...,s and b{x,y) = Ofor any other choices of basis vectors 
X and y. 

Proof If b is identically zero, then simply choose a basis (zi,...,z„) and take 
s — 0, t — n. So suppose not. 

Choose a pair of vectors u and w such that c = b{u, w) ^ 0. Replacing w by 
w/c, we have b{u,w) = 1. 

We claim that u and w are linearly independent. For suppose that cu + dw = 0. 
Then 

= b{u,cu + dw) = cb{u,u) +db{u,w) = d, 
= b{w,cu + dw) = cb{w,u) + db{w,w) = —c, 

so c = J = 0. We take ui=u and wi = v as our first two basis vectors. 

Now let U = (m,w) and W = {x : b{u,x) = b{w,x) = 0}. We claim that 
V = U (BW . The argument just above already shows that (7 fl W = 0, so we have 
to show that V — U + W. So take a vector v e V , and let x — —b{w, v)u + b{u, v)w. 
Then 

b{u,x) = —b{w,v)b{u,u) +b{u,v)b{u,w) = b{u,v), 
b{wjx) = —b(w,v)b{w,u) + b{ujv)b{w,w) = b{w,v) 

so b{u,v — x) = b{w,v—x) = 0. Thus v — x^W. But clearly x eU, and so our 
assertion is proved. 

Now & is an alternating bilinear form on W, and so by induction there is a 
basis of the required form for W, say (m2, . . . , Us,W2, ■ ■ .,Ws,zi,. ■ • ,Zf). Putting in 
Ml and wi gives the required basis for V. 

9.2 Skew-symmetric and alternating matrices 

A matrix A is skew-symmetric if = —A. 

A matrix A is alternating if A is skew-symmetric and has zero diagonal. If the 
characteristic of the field K is not equal to 2, then any skew-symmetric matrix is 
alternating; but if the characteristic is 2, then the extra condition is needed. 

Recall the matrix representing a bilinear form b relative to a basis (vi , . . . , v„): 
its (ij) entry is ^(v,-,v/). 

Proposition 9.3 An alternating bilinear form b on a vector space over K is rep- 
resented by an alternating matrix; and any alternating matrix represents an alter- 
nating bilinear form. If the characteristic o/K is not 2, we can replace "alternat- 
ing matrix" by "skew-symmetric matrix". 
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Proof This is obvious since if b is alternating then ap = b{vj,Vi) = —b{vi, vj) = 
-Qij and an = b{vi, v;) = 0. 

So we can write our theorem in matrix form as follows: 

Theorem 9.4 Let A be an alternating matrix (or a skew-symmetric matrix over a 
field whose characteristic is not equal to 2). Then there is an invertible matrix P 

r 



such that P^AP is the matrix with s blocks 



-1 



on the diagonal and all other 



entries zero. Moreover the number s is half the rank of A, and so is independent 
of the choice of P. 

Proof We know that the effect of a change of basis with transition matrix P is to 
replace the matrix A representing a bilinear form by P^AP. Also, the matrix in the 
statement of the theorem is just the matrix representing b relative to the special 
basis that we found in the preceding theorem. 

This has a corollary which is a bit surprising at first sight: 

Corollary 9.5 (a) The rank of a skew-symmetric matrix (over afield of char- 
acteristic not equal to 2) is even. 

(b) The determinant of a skew -symmetric matrix (over afield of characteristic 
not equal to 2) is a square, and is zero if the size of the matrix is odd. 

Proof (a) The canonical form in the theorem clearly has rank 2s. 

(b) If the skew- symmetric matrix A is singular then its determinant is zero, 
which is a square. So suppose that it is invertible. Then its canonical form has 

^ ^ ' on the diagonal. Each of these blocks has determinant 1, 



5 = n/2 blocks 



1 



and hence so does the whole matrix. So det(P^AP) = det(P)^det(A) = 1, whence 
det(A) = l/(det(/')^), which is a square. 

If the size n of A is odd, then the rank cannot be n (by (a)), and so det(A) = 0. 



Remark There is a function defined on skew-symmetric matrices called the 
Pfaffian, which like the determinant is a polynomial in the matrix entries, and 
has the property that det(A) is the square of the Pfaffian of A: that is, det(A) = 

(Pf(A))2. 



For example. 



Pf 



a 
-a 



a, 



Pf 






a 


b 


c 


—a 





d 


e 


-b 


-d 





f 


—c 


—e 


-/ 






af — be + cd. 



(Check that the determinant of the second matrix is {af — be + cd)^.) 
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9.3 Complex skew-Hermitian matrices 

What if we play the same variation that led us from real symmetric to complex 
Hermitian matrices? That is, we are working in a complex inner product space, 
and if a is represented by the matrix A, then its adjoint is represented by A^ , the 
conjugate transpose of A. 

The matrix A is Hermitian if it is equal to its adjoint, that is, if A^ = A. So we 
make the following definition: 

Definition 9.2 The complex nxn matrix A is skew-Hermitian if = —A. 

Actually, things are very much simpler here, because of the following obser- 
vation: 

Proposition 9.6 The matrix A is skew-Hermitian if and only ifiA is Hermitian. 
Proof Try it and see! 

Corollary 9.7 Any skew-Hermitian matrix can be diagonalised by a unitary ma- 
trix. 

Proof This follows immediately from the Proposition preceding. 

Alternatively, a skew-Hermitian matrix is obviously normal, and the Corollary 
follows from our result about normal matrices (Theorem 8.5). 

Since the eigenvalues of a Hermitian matrix are real, we see that the eigenval- 
ues of a skew-Hermitian matrix are imaginary. 



Appendix A 

Fields and vector spaces 



Fields 

A field is an algebraic structure K in which we can add and multiply elements, 
such that the following laws hold: 

Addition laws 

(FAO) For any a,b eK, there is a unique element a + b eK. 

(FAl) For all a,b,c E K, we have a + (b + c) = {a + b)+c. 

(FAT) There is an element e K such that a + = + a = a for all a e K. 

(FAS) For any a e K, there exists —aeK such that a+ (—a) = (— a) + a = 0. 

(FA4) For any a, G K, we have a + b = b + a. 

Multiplication laws 

(FMO) For any a,b eK, there is a unique element ab e K. 
(FMl) For all a,b,c E K, we have a{bc) = {ab)c. 

(FM2) There is an element 1 e K, not equal to the element from (FA2), such 
that al = la = a for all aEK. 

(FM3) For any aEK with a ^ 0, there exists e K such that aa~^ = 
a~^a = 1. 

(FM4) For any a,b eK, we have ab = ba. 
Distributive law 

(D) For all a,b,c E K, we have a{b + c) = ab + ac. 
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Note the similarity of the addition and multiplication laws. We say that (K, +) 

is an abelian group if (FA0)-(FA4) hold. Then (FM0)-(FM4) say that (K\ {0}, •) 
is also an abelian group. (We have to leave out because, as (FM3) says, does 
not have a multiplicative inverse.) 

Examples of fields include Q (the rational numbers), M (the real numbers), C 
(the complex numbers), and (the integers mod p, for p a prime number). 

Associated with any field K there is a non-negative integer called its character- 
istic, defined as follows. If there is a positive integer n such that l + lH hl=0, 

where there are n ones in the sum, then the smallest such n is prime. (For \fn — rs, 
with r,s> 1, and we denote the sum of n ones by n • 1, then 

= n-l = (r-l)(5-l); 

by minimaUty of n, neither of the factors r • 1 and s • 1 is zero. But in a field, the 
product of two non-zero elements is non-zero.) If so, then this prime number is 
the characteristic of K. If no such n exists, we say that the characteristic of K is 
zero. 

For our important examples, Q, M and C all have characteristic zero, while Fp 
has characteristic p. 



Vector spaces 

Let K be a field. A vector space V over IK is an algebraic structure in which we 
can add two elements of V, and multiply an element of V by an element of K (this 
is called scalar multiplication), such that the following rules hold: 

Addition laws 

(VAO) For any u,v eV, there is a unique element u + v eV. 

(VAl) For all m, v, w e V, we have w -I- (v -I- w) = (w -I- v) -|- w. 

(VA2) There is an element eV such that v -I- = -I- v = av for all v e V . 

(VA3) For any v eV, there exists —vEV such that v-|- (— v) — (— v) -|- v = 0. 

(VA4) For any u,v eV, we have u + v = v + u. 

Scalar multiplication laws 

(VMO) For any aEK,vEV, there is a unique element av E V. 
(VMl) For any a e K, w, v e V, we have a{u + v) = au + av. 
(VM2) For any a,b eK,v EV,we have (a -|- b)v = av + bv. 
(VMS) For any a,b eK,v EV,we have {ab)v = a{bv). 
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(VM4) For any v e V, we have Iv = v (where 1 is the element given by (FM2)). 

Again, we can summarise (VA0)-(VA4) by saying that (V, +) is an abelian 
group. 

The most important example of a vector space over a field K is the set of 
all n-tuples of elements of K: the addition and scalar multiplication are defined 
by the rules 

{ui,U2,...,U„) + {vi,V2,...,Vn) = (mi + Vl , M2 + V2, • • • , Mn + V„), 

a{vi,V2,...,Vn) = {avi,av2,...,avn). 

The fact that K" is a vector space will be assumed here. Proofs are straightfor- 
ward but somewhat tedious. Here is a particularly easy one, the proof of (VM4), 
as an example. 

If v= (vi,...,Vn),then 

lv= l(vi,...,v„) = (Ivi,..., lv„) = (vi,...,v„) =v. 

The second step uses the definition of scalar multiplication in K", and the third 
step uses the field axiom (FM2). 
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Appendix B 

Vandermonde and circulant 
matrices 

The Vandermonde matrix V{ai,a2, ■ ■ ■ ,a„) is the nxn matrix 



a 



1 


1 


1 


ai 




an 








a\ 


al . 


■ al 















This is a particularly important type of matrix. We can write down its deter- 
minant explicitly: 

Theorem B.l 

det(y(ai,a2,...,fl„)) = f^(ay-a,). 

i<j 

That is, the determinant is the product of the differences between all pairs of 
parameters ai. From this theorem, we draw the following conclusion: 

Corollary B.2 The matrix V (ai,a2, . . • ,an) is invertible if and only if the param- 
eters ai,a2, ■ ■ ■ ,an are all distinct. 

For the determinant can be zero only if one of the factors vanishes. 

Proof To prove the theorem, we first regard a„ as a variable x, so that the de- 
terminant A is a polynomial f{x) of degree n — l in x. We see that = 
for 1 < / < n — 1, since the result is the determinant of a matrix with two equal 
columns. By the Factor Theorem, 

A = K{x — ai){x — a2) • • • {x — a„-i), 
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where K is independent of x. In other words, the original determinant is K{an — 
«i) ■ ■ ■ (f^n — an-i)- In the same way, all differences {aj — a,) for i < j are factors, 
so that the determinant is times the product of all these differences, where 
does not contain any of ai , . . . , a„, that is, is a constant. 

To find Kq, we observe that the leading diagonal of the matrix gives us a term 
02^3 •• -a"" ^ in the determinant with sign +1; but this product is obtained by 
taking the term with larger index from each factor in the product, also giving sign 
+ 1. So ^0 = 1 and the theorem is proved. 



Another general type of matrix whose determinant can be calculated explicitly 
is the circulant matrix, whose general form is as follows: 



C(ao,...,an-l) 



ClQ Ql a2 

a„-i ao a\ 
a„_2 fln-i ao 



fl2 ^^3 



«M-1 

«n-2 
«n-3 

ao 



Theorem B.3 Let C = C{aQ, . . . ,an~\) be a circulant matrix over the field C Let 
(0 = e^^V" be a primitive nth root of unity. Then 

(a) C is diagonalisable; 

(b) the eigenvalues ofC are L}=o (^j<^^^> far A^ = 0, 1; 

(c) det(C) is the product of the eigenvalues listed in (b). 



Proof We can write down the eigenvectors. For ^ = 0, 1, . . . ,n — 1, let 
Vk=[l Co'' ... ©("-1)*]^. The ;th entry in Cvk is 

Un-j + a„_;+i + • • • + a„_;_i 
= ao«^'* + ■ ■ ■+ a„-j- ^)* + a„_ya)"* + • • • + a„_i£o(''+^'-i)^ 



(0 



jk 



using the fact that (if = \. This is aQ-\-a\(0^-\ |-a„_iC!)(" times the jth 

entry in vj^. So 

as required. 

Now the vectors vo,...,v„_i are linearly independent. (Why? They are the 
columns of a Vandermonde matrix y(l,ft),...,a)"^), and the powers of (o are 
all distinct; so the first part of this appendix shows that the determinant of this 
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matrix is non-zero, so that the columns are Unearly independent.) Hence we have 
diagonalised C, and its eigenvalues are as claimed. 

Finally, for part (c), the determinant of a diagonaUsable matrix is the product 
of its eigenvalues. 



Example B.l We have the identity 



+ + 0^ -3abc = 



a b c 
cab 
b c a 



= {a-\-b -\- c){a-\- (Ob -\- (0^c){a-\- (O^b -\- (Oc), 



where ft) = e^''^'/^. 

This formula has an application to solving cubic equations. Consider the equa- 
tion 

+ax^ + bx + c = Q. 
By "completing the cube", putting y = x + ^a, we get rid of the square term: 

for some d, e. Now, as above, we have 

-'iuvy-\-w' -\-v^ = {y -\-u-\-v){y -\- (Ou-\- (0^v){y -\- (0^u-\- (Ov), 

so if we can find u and v satisfying — 3mv = d and + = e, then the solutions 
of the equation arcy = —u — v,y = —COu — CO^v, and y = —CO^u — (Ov. 

Let U = u^ and V = v^ . Then U + V = e and UV = —d^/21. Thus we can find 
U and V by solving the quadratic equation — ez — d^ /21 — 0. Now m is a cube 
root of U, and then v = —d/ (3m), and we are done. 



Remark The formula for the determinant of a circulant matrix works over any 
field K which contains a primitive nth root of unity. 
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Appendix C 

The Friendship Theorem 



The Friendship Theorem states: 

Given a finite set of people with the property that any two have a 
unique common friend, there must be someone who is everyone else's 
friend. 

The theorem asserts that the configuration must look like this, where we rep- 
resent people by dots and friendship by edges: 




The proof of the theorem is in two parts. The first part is "graph theory", the 
second uses linear algebra. We argue by contradiction, and so we assume that we 
have a counterexample to the theorem. 

Step 1: Graph theory We show that there is a number m such that everyone 
has exactly m friends. [In the terminology of graph theory, this says that we have 
a regular graph of valency m.] 

To prove this, we notice first that if Pi and P2 are not friends, then they have 
the same number of friends. For they have one common friend P3; any further 
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friend Q of Pi has a common friend R with P2, and conversely, so we can match 
up the common friends as in the next picture. 



Now let us suppose that there are two people P and Q who have different 
numbers of friends. By the preceding argument, P and Q must be friends. They 
have a common friend R. Any other person S must have a different number of 
friends from either P or Q, and so must be the friend of either P or Q (but not 
both). Now if S is the friend of P but not Q, and T is the friend of Q but not P, 
then any possible choice of the common friend of S and T leads to a contradiction. 
So this is not possible; that is, either everyone else is P's friend, or everyone else 
is Q's friend. But this means that we don't have a counterexample after all. 

So we conclude this step knowing that the number of friends of each person is 
the same, say m, as claimed. 

Step 2: Linear algebra We prove that m = 2. 

Suppose that there are n people P\,...,Pn. Let A be the n x n matrix whose 
(/,_/) entry is 1 if Pi and Pj are friends, and is otherwise. Then by assumption, 
A is an n X n symmetric matrix. Let / be the nxn matrix with every entry equal 
to 1; then J is also symmetric. 

Consider the product AJ. Since every entry of / is equal to 1, the (z, j) entry 
of AJ is just the number of ones in the /th row of A, which is the number of 
friends of Pi; this is m, by Step 1. So every entry of AJ is m, whence AJ = mJ. 
Similarly, JA = mJ. Thus, A and / are commuting symmetric matrices, and so 
by Theorem 7.9, they can be simultaneously diagonalised. We will calculate their 
eigenvalues. 

First let us consider /. If j is the column vector with all entries 1, then clearly 
Jj — n j, so j is an eigenvector of J with eigenvalue n. The other eigenvalues of J 
are orthogonal to j. Now v • 7 = means that the sum of the components of v is 
zero; this implies that Jv = 0. So any vector orthogonal to j is an eigenvector of J 
with eigenvalue 0. 

Now we turn to A, and observe that 




A^ = {m-l)I + J. 
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For the (/, 7) entry of is equal to the number of people who are friends of 
both Pi and Pj. If / = j, this number is m, while if / ^ j then (by assumption) 
it is 1. So has diagonal entries m and off-diagonal entries 1, so it is equal to 
(m — 1 )/ + 7, as claimed. 

The all-one vector j satisfies Aj = m j, so is an eigenvector of A with eigen- 
value m. This shows, in particular, that 

m^j = A^; = ((m - 1 )/ + 7); = (m - 1 + n)j, 

so that n = — m + 1. (Exercise: Prove this by a counting argument in the 
graph.) 

As before, the remaining eigenvectors of A are orthogonal to j, and so are 
eigenvectors of J with eigenvalue 0. Thus, if v is an eigenvector of A with eigen- 
value A, not a multiple of j, then 

X^v = A^v={{m-l)I + J)v^{m-l)v, 

so = m — 1, and A = ±^/^n^^. 

The diagonal entries of A are all zero, so its trace is zero. So if we let / and g 
be the multiplicities of ^/rn^^ and —\Jm— 1 as eigenvalues of A, we have 

= Tr(A) = m + fy/m— 1 +g{—\/m — 1) =m + {f — g)Vm — 1. 

This shows that m — 1 must be a perfect square, say m — \ = u^, from which we 
see that m is congruent to 1 mod m. But the trace equation is — m + {f — g)u; this 
says that 0=1 mod u. This is only possible if w = 1. But then m = 2, n = 3, and 
we have the Three Musketeers (three individuals, any two being friends). This 
configuration does indeed satisfy the hypotheses of the Friendship Theorem; but 
it is after all not a counterexample, since each person is everyone else's friend. So 
the theorem is proved. 
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Appendix D 

Who is top of the league? 



In most league competitions, teams are awarded a fixed number of points for a 
win or a draw. It may happen that two teams win the same number of matches and 
so are equal on points, but the opponents beaten by one team are clearly "better" 
than those beaten by the other. How can we take this into account? 

You might think of giving each team a "score" to indicate how strong it is, and 
then adding the scores of all the teams beaten by team T to see how well T has 
performed. Of course this is self-referential, since the score of T depends on the 
scores of the teams that T beats. So suppose we ask simply that the score of T 
should be proportional to the sum of the scores of all the teams beaten by T . 

Now we can translate the problem into linear algebra. Let Ti , . . . , r„ be the 
teams in the league. Let A be the n x n matrix whose (/, j) entry is equal to 1 if 7} 
beats Tj, and otherwise. Now for any vector [x\ X2 ... of scores, the 

ith entry of Ax is equal to the sum of the scores xj for all teams Tj beaten by 7}. 
So our requirement is simply that 

X should be an eigenvector of A with all entries positive. 

Here is an example. There are six teams A, B, C, D, E, and F. Suppose that 



A 


beats B, C, D, E; 


B 


beats C, D, E, F; 


C 


beats D, E, F; 


D 


beats E, F; 


E 


beats F; 


F 


beats A. 
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The matrix A is 






1 


1 


1 


1 











1 


1 


1 


1 











1 


1 


1 














1 


1 

















1 


1 


















We see that A and B each have four wins, but that A has generally beaten the 
stronger teams; there was one upset when F beat A. Also, E and F have the fewest 
wins, but F took A's scalp and should clearly be better. 
Calculation with Maple shows that the vector 

[0.7744 0.6452 0.4307 0.2875 0.1920 0.3856]^ 

is an eigenvector of A with eigenvalue 2.0085. This confirms our view that A is 
top of the league and that F is ahead of E; it even puts F ahead of D. 

But perhaps there is a different eigenvalue and/or eigenvector which would 
give us a different result? 

In fact, there is a general theorem called the Perron-Frobenius theorem which 
gives us conditions for this method to give a unique answer. Before we state it, 
we need a definition. 

Definition D.l Let A be an n x n real matrix with all its entries non-negative. We 
say that A is indecomposable if, for any /, j with 1 <ij <n, there is a number m 
such that the (/, 7) entry of A"* is strictly positive. 

This odd-looking condition means, in our football league situation, that for 
any two teams 7] and Tj, there is a chain 7^^, . . . , T^^ with = 7} and = Tj, 
sich that each team in the chain beats the next one. Now it can be shown that 
the only way that this can fail is if there is a collection C of teams such that each 
team in C beats each team not in C. In this case, obviously the teams in C occupy 
the top places in the league, and we have reduced the problem to ordering these 
teams. So we can assume that the matrix of results is indecomposable. 

In our example, we see that B beats F beats A, so the (2, 1) entry in A^ is 
non-zero. Similarly for all other pairs. So A is indecomposable in this case. 

Theorem D.l (Perron-Frobenius Theorem) Let A be a n x n real matrix with 
all its entries non-negative, and suppose that A is indecomposable. Then, up to 
scalar multiplication, there is a unique eigenvector v=[x\ ... Xn]^forA with 
the property that Xi > for all i. The corresponding eigenvalue is the largest 
eigenvalue of A. 

So the Perron-Frobenius eigenvector solves the problem of ordering the teams 
in the league. 
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Remark Sometimes even this extra level of sophistication doesn't guarantee a 

result. Suppose, for example, that there are five teams A, B, C, D, E; and suppose 
that A beats B and C, B beats C and D, C beats D and E, D beats E and A, and E 
beats A and B. Each team wins two games, so the simple rule gives them all the 
same score. The matrix A is 






1 


1 














1 


1 














1 


1 


1 











1 


1 


1 












which is easily seen to be indecomposable; and if v is the all-1 vector, then Av = 
2v, so that V is the Perron-Frobenius eigenvector. So even with this method, all 
teams get the same score. In this case, it is clear that there is so much symmetry 
between the teams that none can be put above the others by any possible rule. 

Remark Further refinements are clearly possible. For example, instead of just 
putting the (ij) entry equal to 1 if 7} beats Tj, we could take it to be the number 
of goals by which 7] won the game. 

Remark This procedure has wider application. How does an Internet search 
engine like Google find the most important web pages that match a given query? 
An important web page is one to which a lot of other web pages link; this can be 
described by a matrix, and we can use the Perron-Frobenius eigenvector to do the 
ranking. 
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Appendix E 

Other canonical forms 



One of the unfortunate things about linear algebra is that there are many types 
of equivalence relation on matrices! In this appendix I say a few brief words 
about some that we have not seen elsewhere in the course. Some of these will be 
familiar to you from earlier linear algebra courses, while others arise in courses 
on different parts of mathematics (coding theory, group theory, etc.) 

Row-equivalence 

Two matrices A and B of the same size over K are said to be row-equivalent if 
there is an invertible matrix P such that B — PA. Equivalently, A and B are row- 
equivalent if we can transform A into B by the use of elementary row operations 
only. (This is true because any invertible matrix can be written as a product of 
elementary matrices; see Corollary 2.6.) 

A matrix A is said to be in echelon form if the following conditions hold: 

• The first non-zero entry in any row (if it exists) is equal to 1 (these entries 
are called the leading ones); 

• The leading ones in rows lower in the matrix occur further to the right. 

We say that A is in reduced echelon form if, in addition to these two conditions, 
also 

• All the other entries in the column containing a leading one are zero. 
For example, the matrix 

'0 \ a b Q c' 
1 J 


is in reduced echelon form, whatever the values of a, . . . , e. 
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Theorem E.l Any matrix is row-equivalent to a unique matrix in reduced echelon 
form. 



Coding equivalence 

In the theory of error-correcting codes, we meet a notion of equivalence which lies 
somewhere between row-equivalence and equivalence. As far as I know it does 
not have a standard name. 

Two matrices A and B of the same size are said to be coding-equivalent if B 
can be obtained from A by a combination of arbitrary row operations and column 
operations of Types 2 and 3 only. (See page 16). 

^Ir A 
O O 

some matrix A. To see this, use row operations to put the matrix into reduced ech 
elon form, then column permutations to move the columns containing the leading 
ones to the front of the matriK. 

Unfortunately this is not a canonical form; a matrix can be coding-equivalent 
to several different matrices of this special form. 

It would take us too far afield to explain why this equivalence relation is im- 
portant in coding theory. 



Using these operations, any matrix can be put into block form 



for 



Congruence over other fields 

Recall that two symmetric matrices A and B, over a field K whose characteristic is 
not 2, are congruent if 5 = P^AP for some invertible matrix P. This is the natural 
relation arising from representing a quadratic form relative to different bases. 

We saw in Chapter 5 the canonical form for this relation in the cases when K 
is the real or complex numbers. 

In other cases, it is usually much harder to come up with a canonical form. 
Here is one of the few cases where this is possible. I state the result for quadratic 
forms. 



Theorem E.2 Let ¥p be the field of integers mod p, where p is an odd prime. Let 
c be a fixed element of¥p which is not a square. A quadratic form q in n variables 
over ¥p can be put into one of the forms 

xf-l Vxl, x\^ \-xj_i-\-cxj 

by an invertible linear change of variables. Any quadratic form is congruent to 
just one form of one of these types. 
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Worked examples 



1. Let 



A = 



1 2 
1 2 
-1 -2 



4 - 

3 - 
1 



1 

1 



5 

3 
3 



(a) Find a basis for the row space of A. 

(b) What is the rank of A? 

(c) Find a basis for the column space of A. 

(d) Find invertible matrices P and Q such that PAQ is in the canon- 
ical form for equivalence. 

(a) Subtract the first row from the second, add the first row to the third, then 
multiply the new second row by —1 and subtract four times this row from the 
third, to get the matrix 



The first two rows clearly form a basis for the row space. 

(b) The rank is 2, since there is a basis with two elements. 

(c) The column rank is equal to the row rank and so is also equal to 2. By 

inspection, the first and third columns of A are linearly independent, so they form 
a basis. The first and second columns are not linearly independent, so we cannot 
use these! (Note that we have to go back to the original A here; row operations 
change the column space, so selecting two independent columns of B would not 
be correct.) 



B = 



12 4-15 
10 2 
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(d) By step (a), we have PA = B, where P is obtained by performing the same 
elementary row operations on the 3 x 3 identity matrix 73: 



1 
1 -1 
-3 4 1 



Now B can be brought to the canonical form 



C = 



1 
10 




by subtracting 2, 4, — 1 and 5 times the first column from the second, third, fourth 

and fifth columns, and twice the third column from the fifth, and then swapping 
the second and third columns; so C = BQ (whence C = PAQ), where Q is obtained 
by performing the same column operations on I5 : 



Q 



1 


-4 


-2 


1 


3 








1 











1 








-2 











1 

















1 



Remark: P and Q can also be found by multiplying elementary matrices, if 
desired; but the above method is simpler. You may find it easier to write an identity 
matrix after A and perform the row operations on the extended matrix to find P, 
and to put an identity matrix underneath B and perform the column operations on 
the extended matrix to find Q. 



2. A certain country has n political parties ^i, 0^n- At the 
beginning of the year, the percentage of voters who supported the 
party S^i was xi. During the year, some voters change their minds; a 
proportion aij of former supporters of will support at the end 
of the year. 

Let V be the vector \x\ X2 ■■■ Xn]^ recording support for the par- 
ties at the beginning of the year, and A the matrix whose {i^j) entry is 

Qij. 

(a) Prove that the vector giving the support for the parties at the end 
of the year is Av. 
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(b) In subsequent years, exactly the same thing happens, with the 
same proportions. Show that the vector giving the support for 
the parties at the end of m years is A'"v. 

(c) Suppose that n = 2 and that 



A = 



0.9 
0.1 



0.3 
0.7 



Show that, after a long time, the support for the parties will be 
approximately 0.75 for to 0.25 for 

(a) Let yi be the proportion of the population who support at the end of 
the year. From what we are given, the proportion supporting at the beginning 
of the year was xj, and a fraction atj of these changed their support to So 
the proportion of the whole population who supported at the beginning of the 
year and at the end is aijxj. The total support for is found by adding these 
up for all j: that is. 



52 ^'7-*^' 



or v' = Av, where v' is the vector [ji ... j^]^ giving support for the parties at 
the end of the year. 

(b) Let Vk be the column vector whose ith component is the proportion of the 
population supporting party after the end of k years. In part (a), we showed 
that vi — Avq, where vq — v. An exactly similar argument shows that Vk — Av^-i 
for any k. So by induction, v,„ = P'^vq = P'^v, as required. (The result of (a) starts 
the induction with m— 1. If we assume that Vy^-i = A*~^v, then 



Vk = Avk-i=A{A'' V) = 

and the induction step is proved.) 

(c) The matrix P has characteristic polynomial 



:AV 



x-0.9 -0.3 
-0.7 x-OJ 



:jc2-1.6x+0.6= (x-l)(x-0.6). 



So the eigenvalues of P are 1 and 0.6. We find by solving linear equations that 



eigenvectors for the two eigenvalues are 



and 



1 



1 



text, we compute that the corresponding projections are 



respectively. As in the 



Pi 



0.75 0.75 
0.25 0.25 



Pi 



0.25 -0.75 
-0.25 0.75 
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(Once we have found Pi, we can find P2 as I — Pi.) Then P is diagonalisable: 

A = Pi + 0.6P2. 
From this and Proposition 4.6 we see that 

A'" = Pi + (0.6)'"P2. 

As m ^ 00, we have (0.6)'" 0, and so A — > Pi. So in the limit, if vq = 



IS 



the matrix giving the initial support for the parties, with j:+_y = 1, then the matrix 
giving the final support is approximately 



"0.75 0.75" 




X 




'Q).15{x + y)' 




"0.75" 


0.25 0.25 




y_ 




0.15{x + y)_ 




0.25 



As a check, use the computer with Maple to work out P™ for some large value 
of m. For example, I find that 



plO ^ 



0.7515116544 0.7454650368 
0.2484883456 0.2545349632 



3. The vectors vi , V2, V3 form a basis for V = ; the dual basis of V* 
is /i , /2, /3. A second basis for V is given by w\ = vi + V2 + V3, W2 = 
2vi + V2 + V3, W3 = 2v2 + V3. Find the basis of V* dual to w\ , W2, W3. 

The first dual basis vector gi satisfies g\{w\) — 1, ^1(^2) = giiw-i) — 0. If 
g\=xfi+ yfi + z/3, we find 



x + y + z 
2jc + j + z 
2y + z 



= 1, 
= 0, 
= 0, 



giving x = —l,y = —2, z = 4. So gi = —f\ — 2/2 + 4/3. Solving two similar sets 
of equations gives g2 = fi+f2- 2/3 and §3 = /2 - /s- 

Alternatively, the transition matrix P from the vs to the ws is 

1 2 0' 
1 1 2 
1 1 1 

and we showed in Section 5.1.2 that the transition matrix between the dual bases 
is 



(p-y- 

The coordinates of the gs in the basis of /s are the columns of this matrix. 



-1 1 
-2 1 1 
4 -2 -1 



Ill 



4. The Fibonacci numbers Fn are defined by the recurrence relation 

Fo = 0, Fi = l, Fn+2 = F„+Fn+iiorn>0. 



Let A be the matrix 



1 

1 1 

A" 



Prove that 

Fn-l Fn 
Fn Fn+l 



and hence find a formula for F„. 



The equation for F„ is proved by induction on n. It is clearly true for n = 1. 
Suppose that it holds for n; then 

Fn Fn+l 
Fn+l Fn+2_ 

So the induction step is proved. 

To find a formula for F„, we show that A is diagonalisable, and then write 

A = XiPi + X2P2, where Pi and Pi are projection matrices with sum / satisfying 
PiP2 = P2P1 = 0. Then we get A" = AfPi + l^P2, and taking the (1,2) entry we 
find that 

where ci and C2 are the (1,2) entries of Pi and P2 respectively. 

From here it is just calculation. The eigenvalues of A are the roots of = 
det(jc/ — A) = — X — I; that is, Ai,/l2 = ^(1 ± VS). (Since the eigenvalues are 
distinct, we know that A is diagonalisable, so the method will work.) Now because 
Pi+P2 = L the (1,2) entries of these matrices are the negatives of each other; so 
we have F„ = c{X" — X2). Rather than find Pi explicitly, we can now argue as 
follows: \—Fi= c(Ai — A2) = c^/S, so that c = 1/ \/5 and 




5. Let Vn be the vector space of real polynomials of degree at most n. 
(a) Show that the function 

f-8=[ f{x)8{x)dx 
Jo 

is an inner product on Vn. 



"n-l 



"n+l 



1 

1 1 



Fn Fn-l+Fn 
Fn+l Fn+F„+i 
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(b) In the case n = 3, write down the matrix representing the bilinear 
form relative to the basis 1 ^forVa. 

(c) Apply the Gram-Schmidt process to the basis (l,x,x^) to find 
an orthonormal basis for V2. 

(d) Let Wn be the subspace of y„ consisting of all polynomials f{x) 
of degree at most n which satisfy /(O) = /( 1 ) = 0. Let D.Wn^ 
Wn be the linear map given by differentiation: {Df){x) — f'{x). 
Prove that the adjoint of D is —D. 

(a) Put b{f\g) = Jq f{x)g{x)dx. The function b is obviously symmetric. So 
we have to show that it is linear in the first variable, that is, that 



{Mx)+f2ix))gix)dx 

f\cf{x))g{x) 
Jo 



dx 



fi{x)g{x)dx+ / f2ix)g{x)dx 



■I fix)gix)dx, 
Jo 



which are clear from elementary calculus. 

We also have to show that the inner product is positive definite, that is, that 
b{f-if) > 0, with equality if and only if / = 0. This is clear from properties of 
integration. 

(b) If the basis is /i = 1, /2 = x, = x^,/4 = x^, then the entry of the 
matrix representing b is 



1 



jc'-V-^djc= 

i + J 



1' 



so the matrix is 



1 

1 
2 
1 

3 
1 

■-4 



(c) The first basis vector is clearly 1. To make x orthogonal to 1 we must 
replace it by x + a for some a; doing the integral we find that a = — ^ . To make x^ 
orthogonal to the two preceding is the same as making it orthogonal to 1 and x, so 
we replace ithy x^ + bx+ c; we find that 



\ + ^b + c 
l + ^b+'^c 



= 0, 
= 0, 



so that b 



1 and c = g. 
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Now 1-1 = 1, (x- j) • (x- I) = ^, and {x^-x+ g) • {x^-x+ g) = j^; so 
the required basis is 

(d) Integration by parts shows that 

f-D{g) = ['f{x)g'{x)dx 
Jo 

= [fix)gix)]l- f'{x)gix)dx 
= -{Df)-g, 

where the first term vanishes because of the condition on polynomials in W„. Thus, 
by definition, the adjoint of D is —D. 

6. Let A and B be real symmetric matrices. Is each of the following 
statements true or false? Give brief reasons. 

(a) If A and B are orthogonally similar then they are congruent. 

(b) If A and B are orthogonally similar then they are similar. 

(c) If A and B are congruent then they are orthogonally similar. 

(d) If A and B are similar then they are orthogonally similar. 

Recall that A and B are similar if 5 = P~^AP for some invertible matrix P; they 
are congruent if fi = P^AP for some invertible matriK P; and they are orthogonally 
similar if 5 = P~^AP for some orthogonal matrix P (invertible matrix satisfying 
P^ = P^^). Thus it is clear that both (a) and (b) are true. 

The Spectral Theorem says that A is orthogonally congruent to a diagonal 
matrix whose diagonal entries are the eigenvalues. If A and B are similar, then 
they have the same eigenvalues, and so are orthogonally congruent to the same 
diagonal matrix, and so to each other. So (d) is true. 

By Sylvester's Law of Inertia, any real symmetric matrix is congruent to a 
diagonal matrix with diagonal entries 1,-1 and 0. If we choose a symmetric 
matrix none of whose eigenvalues is 1, —1 or 0, then it is not orthogonally similar 
to the Sylvester form. For example, the matrices / and II are congruent but not 
orthogonally similar. So (c) is false. 

7. Find an orthogonal matrix P such that P~^AP and P~^BP are di- 
agonal, where 



A = 



B = 



"0 
1 

1 



1 


1 







1 



1 



1 



1 
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Remark A and B are commuting symmetric matrices, so we know that the ma- 
trix P exists. 



First solution We have to find an orthonormal basis which consists of eigenvec- 
tors for both matrices. 

Some eigenvectors can be found by inspection. If vi = (1,1,1,1) then Avi = 
4vi and Bvi = lv\. If V2 = (1, -1, 1, -1) then Av2 = and Bvi = -2v2. Any 
further eigenvector v = (jc,y,z,w) should be orthogonal to both of these, that is, 
x+y + z + w = — x — y + z — w. Sox + z = Q andy + w = 0. Conversely, any such 
vector satisfies Av = and Bv = 0. So choose two orthogonal vectors satisfying 
these conditions, say (1,0,-1,0) and (0,1,0,-1). NormaUsing, we obtain the 
required basis: (l,l,l,l)/2, (1,-1, l,-l)/2, (1,0, -1,0)/V2, (0, 1,0, -1)/V2. 
So 



j_ 

V2 



_ J_ 







J_ 



L 

^/2J 



Second solution Observe that both A and B are circulant matrices. So we know 
from Appendix B that the columns of the Vandermonde matrix 



1 
i 

-1 
— i 



1 1 

-1 -i 

1 -1 

-1 i 



are eigenvectors of both matrices. The second and fourth columns have corre- 
sponding eigenvalues for both matrices, and hence so do any linear combina- 
tions of them; in particular, we can replace these two columns by their real and 
imaginary parts, giving (after a slight rearrangement) the matrix 



1 1 


1 





1 -1 





1 


1 1 


-1 





1 -1 





-1 



After normalising the columns, this gives the same solution as the first. 

The results of Appendix B also allow us to write down the eigenvalues of A 
and B without any calculation. For example, the eigenvalues of B are 



1 + 1=2, i-i = 0, -l-l = -2, -i + i = 0. 
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Remark A more elegant solution is the matrix 



1 
2 



1 1 
1 -1 
1 1 
1 -1 



1 1 

1 -1 

-1 -1 

-1 1 



This matrix (without the factor j) is known as a Hadamard matrix. It is an n x n 
matrix H with all entries ±1 satisfying H — nl. It is known that an n x n 
Hadamard matrix cannot exist unless n is 1, 2, or a multiple of 4; however, nobody 
has succeeded in proving that a Hadamard matrix of any size n divisible by 4 
exists. 

The smallest order for which the existence of a Hadamard matrix is still in 
doubt is (at the time of writing) n — 668. The previous smallest, n — 428, was 
resolved only in 2004 by Hadi Kharaghani and Behruz Tayfeh-Reziae in Tehran, 
by constructing an example. 

As a further exercise, show that, if fl^ is a Hadamard matrix of size n, then 
H H 

is a Hadamard matrix of size In. (The Hadamard matrix of size 4 

M — ti 

constructed above is of this form.) 



8. Let A 



and 5 



Find an invertible matrix P and a diagonal matrix D such that P^AP ■ 
I and P^BP = D, where I is the identity matrix. 



First we take the quadratic form corresponding to A, and reduce it to a sum of 
squares. The form is + 2x)^ + 2)^^, which is {x + y)'^+y^. (Note: This is the sum 
of two squares, in agreement with the fact that A is positive definite.) 

Now the matrix that transforms (x,};) to (x + J,};) is 2 = ^ | , smce 



Hence 



[x y]Q'Q 



'1 r 




X 




x + y 


1 




y. 




y 



so that 2^2 = A. 

Now, if we put P=Q~^ 



= jc^ + 2ry + 2/ = [x y]A 



1 -1 
1 



, we see that P^AP = P^{Q^Q)P = I. 
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What about P^BPl We find that 



P^BP 



' 1 0" 




"1 r 




"1 


-1" 




"1 " 


-1 1 




1 







1 




-1 



the required diagonal matrix. So we are done. 



Remark 1: In general it is not so easy. The reduction of the quadratic form will 
give a matrix Pi such that Pj^APi = /, but in general pJbP\ won't be diagonal; 
all we can say is that it is symmetric. So by the Spectral Theorem, we can find an 
orthogonal matrix P2 such that Pj {Pj BP\)P2 is diagonal. {P2 is the matrix whose 
columns are orthonormal eigenvectors of Pj BP^.) Then because P2 is orthogonal, 
we have 

p:^{pjAPr)P2 = p^iP2=i, 

so that P = P1P2 is the required matrix. 



Remark 2: If you are only asked for the diagonal matrix D, and not the matrix 
P, you can do an easier calculation. We saw in the lectures that the diagonal 
entries of D are the roots of the polynomial det(x4 —B)=0. In our case, we have 

so the diagonal entries of D are +1 and —1 (as we found). 



X— 1 X— I 
x—l 2x 
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